Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH ghak90 V9 00/13] audit: implement container identifier
@ 2020-06-27 13:20 Richard Guy Briggs
  2020-06-27 13:20 ` [PATCH ghak90 V9 01/13] audit: collect audit task parameters Richard Guy Briggs
                   ` (12 more replies)
  0 siblings, 13 replies; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Implement kernel audit container identifier.

This patchset is an eighth based on the proposal document (V4) posted:
	https://www.redhat.com/archives/linux-audit/2019-September/msg00052.html

The first patch was the last patch from ghak81 that was absorbed into
this patchset since its primary justification is the rest of this
patchset.

The second patch implements the proc fs write to set the audit container
identifier of a process, emitting an AUDIT_CONTAINER_OP record to
announce the registration of that audit container identifier on that
process.  This patch requires userspace support for record acceptance
and proper type display.  This patch now includes the conversion 
over from a simple u64 to a list member that includes owner information
to check for descendancy, allow process injection into a container and
prevent id reuse by other orchestrators.

The third implements reading the audit container identifier from the
proc filesystem for debugging.  This patch wasn't planned for upstream
inclusion but is starting to become more likely.

The fourth logs the drop of an audit container identifier once all tasks
using that audit container identifier have exited.

The 5th implements the auxiliary record AUDIT_CONTAINER_ID if an audit
container identifier is associated with an event.  This patch requires
userspace support for proper type display.

The 6th adds audit daemon signalling provenance through audit_sig_info2.

The 7th creates a local audit context to be able to bind a standalone
record with a locally created auxiliary record.

The 8th patch adds audit container identifier records to the user
standalone records.

The 9th adds audit container identifier filtering to the exit,
exclude and user lists.  This patch adds the AUDIT_CONTID field and
requires auditctl userspace support for the --contid option.

The 10th adds network namespace audit container identifier labelling
based on member tasks' audit container identifier labels which supports
standalone netfilter records that don't have a task context and lists
each container to which that net namespace belongs.

The 11th checks that the target is a descendant for nesting and
refactors to avoid a duplicate of the copied function.

The 12th adds tracking and reporting for container nesting.  
This enables kernel filtering and userspace searches of nested audit
container identifiers.

The 13th adds a mechanism to allow a process to be designated as a
container orchestrator/engine in non-init user namespaces.


Example: Set an audit container identifier of 123456 to the "sleep" task:

  sleep 2&
  child=$!
  echo 123456 > /proc/$child/audit_containerid; echo $?
  ausearch -ts recent -m container_op
  echo child:$child contid:$( cat /proc/$child/audit_containerid)

This should produce a record such as:

  type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615


Example: Set a filter on an audit container identifier 123459 on /tmp/tmpcontainerid:

  contid=123459
  key=tmpcontainerid
  auditctl -a exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
  perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\"); close(\$tmpfile);" &
  child=$!
  echo $contid > /proc/$child/audit_containerid
  sleep 2
  ausearch -i -ts recent -k $key
  auditctl -d exit,always -F dir=/tmp -F perm=wa -F contid=$contid -F key=$key
  rm -f /tmp/$key

This should produce an event such as:

  type=CONTAINER_ID msg=audit(2018-06-06 12:46:31.707:26953) : contid=123459
  type=PROCTITLE msg=audit(2018-06-06 12:46:31.707:26953) : proctitle=perl -e sleep 1; open(my $tmpfile, '>', "/tmp/tmpcontainerid"); close($tmpfile);
  type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=1 name=/tmp/tmpcontainerid inode=25656 dev=00:26 mode=file,644 ouid=root ogid=root rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
  type=PATH msg=audit(2018-06-06 12:46:31.707:26953) : item=0 name=/tmp/ inode=8985 dev=00:26 mode=dir,sticky,777 ouid=root ogid=root rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
  type=CWD msg=audit(2018-06-06 12:46:31.707:26953) : cwd=/root
  type=SYSCALL msg=audit(2018-06-06 12:46:31.707:26953) : arch=x86_64 syscall=openat success=yes exit=3 a0=0xffffffffffffff9c a1=0x5621f2b81900 a2=O_WRONLY|O_CREAT|O_TRUNC a3=0x1b6 items=2 ppid=628 pid=2232 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=ttyS0 ses=1 comm=perl exe=/usr/bin/perl subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=tmpcontainerid

Example: Test multiple containers on one netns:

  sleep 5 &
  child1=$!
  containerid1=123451
  echo $containerid1 > /proc/$child1/audit_containerid
  sleep 5 &
  child2=$!
  containerid2=123452
  echo $containerid2 > /proc/$child2/audit_containerid
  iptables -I INPUT -i lo -p icmp --icmp-type echo-request -j AUDIT --type accept
  iptables -I INPUT  -t mangle -i lo -p icmp --icmp-type echo-request -j MARK --set-mark 0x12345555
  sleep 1;
  bash -c "ping -q -c 1 127.0.0.1 >/dev/null 2>&1"
  sleep 1;
  ausearch -i -m NETFILTER_PKT -ts boot|grep mark=0x12345555
  ausearch -i -m NETFILTER_PKT -ts boot|grep contid=|grep $containerid1|grep $containerid2

This would produce an event such as:

  type=NETFILTER_PKT msg=audit(03/15/2019 14:16:13.369:244) : mark=0x12345555 saddr=127.0.0.1 daddr=127.0.0.1 proto=icmp
  type=CONTAINER_ID msg=audit(03/15/2019 14:16:13.369:244) : contid=123452,123451


Includes the last patch of https://github.com/linux-audit/audit-kernel/issues/81
Please see the github audit kernel issue for the main feature:
  https://github.com/linux-audit/audit-kernel/issues/90
and the kernel filter code:
  https://github.com/linux-audit/audit-kernel/issues/91
and the network support:
  https://github.com/linux-audit/audit-kernel/issues/92
Please see the github audit userspace issue for supporting record types:
  https://github.com/linux-audit/audit-userspace/issues/51
and filter code:
  https://github.com/linux-audit/audit-userspace/issues/40
Please see the github audit testsuiite issue for the test case:
  https://github.com/linux-audit/audit-testsuite/issues/64
  https://github.com/rgbriggs/audit-testsuite/tree/ghat64-contid
  https://githu.com/linux-audit/audit-testsuite/pull/91
Please see the github audit wiki for the feature overview:
  https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID

The code is also posted at:
  git://toccata2.tricolour.ca/linux-2.6-rgb.git ghak90-audit-containerID.v9

Changelog:
v9
- rebase on v5.8-rc1
- fix whitespace and oversize lines where practicable
- remove harmless duplicate S_IRUSR in capcontid
- return -EBUSY for both threading and children (drop -EALREADY)
- return -EEXIST if already set and not nesting (drop -ECHILD)
- fix unbalanced brace and remove elseif ladder
- drop check for same contid set again as redundant (drop -EADDRINUSE)
- get reference to contobj's parent taskstruct
- protect all contid list updates with audit_contobj_list_lock
- protect refcounts with rcu read lock
- convert _audit_contobj to _audit_contobj_get, which calls _audit_contobj_hold
- convert audit_log_container_id() and audit_log_contid() from u64 to contobj, simplifying
- issue death certificate on contid after exit of last task
- keep contobj ref to block reuse with -ESHUTDOWN until auditd exit or signal info
- report all contids nested
- rework sig_info2 format to accomodate contid list
- fix zero-length array in include/linux/audit.h struct audit_sig_info2 data[]
- found bug in audit_alloc_local, don't check audit_ever_enabled, since all callers check audit_enabled
- remove warning at declaration of audit_sig_cid of reuse since reuse is now blocked
- report descendancy checking errcodes under -EXDEV (drop -EBADSLT)
- add missed check, replace audit_contid_isowner with audit_contid_isnesting
- limit calls to audit_log_format() with if(iter->parent) ...
- list only one contid in contid, nested in old-contid to avoid duplication
- switch to comma delimiter, carrat modifier in nested contid list
- special case -1 for AUDIT_CID_UNSET printing
- drop contid depth limit and netns contid limit patches
- enforce capcontid policy on contid write and read
- squash conversion to contobj into contid intro patch

v8
- rebase on v5.5-rc1 audit/next
- remove subject attrs in CONTAINER_OP record
- group audit_contid_list_lock with audit_contid_hash
- in audit_{set,log}_contid(), break out of loop after finding target
- use target var to size kmalloc
- rework audit_cont_owner() to bool audit_contid_isowner() and move to where used
- create static void audit_cont_hold(struct audit_contobj *cont) { refcount_inc(&cont->refcount); }
- rename audit_cont{,_*} refs to audit_contobj{,_*}
- prefix special local functions with _ [audit_contobj*()]
- protect contid list traversals with rcu_read_lock() and updates with audit_contid_list_lock
- protect real_parent in audit_contid_depth() with rcu_dereference
- give new contid field nesting format in patch description
- squash task_is_descendant()
- squash support for NETFILTER_PKT into network namespaces
- limit nesting depth based on record length overflow, bandwidth and storage
- implent control for audit container identifier nesting depth limit
- make room for audit_bpf patches (bump CONTAINER_ID to 1335)
- squash proc interface into capcontid
- remove netlink access to loginuid/sessionid/contid/capcontid
- delete 32k contid limit patch
- document potential overlap between signal delivery and contid reuse
- document audit_contobj_list_lock coverage
- document disappearing orch task injection limitation
- limit the number of containers that can be associated with a network namespace
- implent control for audit container identifier netns count limit 

v7
- remove BUG() in audit_comparator64()
- rebase on v5.2-rc1 audit/next
- resolve merge conflict with ghak111 (signal_info regardless syscall)
- resolve merge conflict with ghak73 (audit_field_valid)
- resolve merge conflict with ghak64 (saddr_fam filter)
- resolve merge conflict with ghak10 (ntp audit) change AUDIT_CONTAINER_ID from 1332 to 1334
- rebase on v5.3-rc1 audit/next
- track container owner
- only permit setting contid of descendants for nesting
- track drop of contid and permit reuse
- track and report container nesting
- permit filtering on any nested contid
- set/get contid and loginuid/sessionid via netlink
- implement capcontid to enable orchestrators in non-init user
  namespaces
- limit number of containers
- limit depth of container nesting

v6
- change TMPBUFLEN from 11 to 21 to cover the decimal value of contid
  u64 (nhorman)
- fix bug overwriting ctx in struct audit_sig_info, move cid above
  ctx[0] (nhorman)
- fix bug skipping remaining fields and not advancing bufp when copying
  out contid in audit_krule_to_data (omosnacec)
- add acks, tidy commit descriptions, other formatting fixes (checkpatch
  wrong on audit_log_lost)
- cast ull for u64 prints
- target_cid tracking was moved from the ptrace/signal patch to
  container_op
- target ptrace and signal records were moved from the ptrace/signal
  patch to container_id
- auditd signaller tracking was moved to a new AUDIT_SIGNAL_INFO2
  request and record
- ditch unnecessary list_empty() checks
- check for null net and aunet in audit_netns_contid_add()
- swap CONTAINER_OP contid/old-contid order to ease parsing

v5
- address loginuid and sessionid syscall scope in ghak104
- address audit_context in CONFIG_AUDIT vs CONFIG_AUDITSYSCALL in ghak105
- remove tty patch, addressed in ghak106
- rebase on audit/next v5.0-rc1
  w/ghak59/ghak104/ghak103/ghak100/ghak107/ghak105/ghak106/ghak105sup
- update CONTAINER_ID to CONTAINER_OP in patch description
- move audit_context in audit_task_info to CONFIG_AUDITSYSCALL
- move audit_alloc() and audit_free() out of CONFIG_AUDITSYSCALL and into
  CONFIG_AUDIT and create audit_{alloc,free}_syscall
- use plain kmem_cache_alloc() rather than kmem_cache_zalloc() in audit_alloc()
- fix audit_get_contid() declaration type error
- move audit_set_contid() from auditsc.c to audit.c
- audit_log_contid() returns void
- audit_log_contid() handed contid rather than tsk
- switch from AUDIT_CONTAINER to AUDIT_CONTAINER_ID for aux record
- move audit_log_contid(tsk/contid) & audit_contid_set(tsk)/audit_contid_valid(contid)
- switch from tsk to current
- audit_alloc_local() calls audit_log_lost() on failure to allocate a context
- add AUDIT_USER* non-syscall contid record
- cosmetic cleanup double parens, goto out on err
- ditch audit_get_ns_contid_list_lock(), fix aunet lock race
- switch from all-cpu read spinlock to rcu, keep spinlock for write
- update audit_alloc_local() to use ktime_get_coarse_real_ts64()
- add nft_log support
- add call from do_exit() in audit_free() to remove contid from netns
- relegate AUDIT_CONTAINER ref= field (was op=) to debug patch

v4
- preface set with ghak81:"collect audit task parameters"
- add shallyn and sgrubb acks
- rename feature bitmap macro
- rename cid_valid() to audit_contid_valid()
- rename AUDIT_CONTAINER_ID to AUDIT_CONTAINER_OP
- delete audit_get_contid_list() from headers
- move work into inner if, delete "found"
- change netns contid list function names
- move exports for audit_log_contid audit_alloc_local audit_free_context to non-syscall patch
- list contids CSV
- pass in gfp flags to audit_alloc_local() (fix audit_alloc_context callers)
- use "local" in lieu of abusing in_syscall for auditsc_get_stamp()
- read_lock(&tasklist_lock) around children and thread check
- task_lock(tsk) should be taken before first check of tsk->audit
- add spin lock to contid list in aunet
- restrict /proc read to CAP_AUDIT_CONTROL
- remove set again prohibition and inherited flag
- delete contidion spelling fix from patchset, send to netdev/linux-wireless

v3
- switched from containerid in task_struct to audit_task_info (depends on ghak81)
- drop INVALID_CID in favour of only AUDIT_CID_UNSET
- check for !audit_task_info, throw -ENOPROTOOPT on set
- changed -EPERM to -EEXIST for parent check
- return AUDIT_CID_UNSET if !audit_enabled
- squash child/thread check patch into AUDIT_CONTAINER_ID patch
- changed -EPERM to -EBUSY for child check
- separate child and thread checks, use -EALREADY for latter
- move addition of op= from ptrace/signal patch to AUDIT_CONTAINER patch
- fix && to || bashism in ptrace/signal patch
- uninline and export function for audit_free_context()
- drop CONFIG_CHANGE, FEATURE_CHANGE, ANOM_ABEND, ANOM_SECCOMP patches
- move audit_enabled check (xt_AUDIT)
- switched from containerid list in struct net to net_generic's struct audit_net
- move containerid list iteration into audit (xt_AUDIT)
- create function to move namespace switch into audit
- switched /proc/PID/ entry from containerid to audit_containerid
- call kzalloc with GFP_ATOMIC on in_atomic() in audit_alloc_context()
- call kzalloc with GFP_ATOMIC on in_atomic() in audit_log_container_info()
- use xt_net(par) instead of sock_net(skb->sk) to get net
- switched record and field names: initial CONTAINER_ID, aux CONTAINER, field CONTID
- allow to set own contid
- open code audit_set_containerid
- add contid inherited flag
- ccontainerid and pcontainerid eliminated due to inherited flag
- change name of container list funcitons
- rename containerid to contid
- convert initial container record to syscall aux
- fix spelling mistake of contidion in net/rfkill/core.c to avoid contid name collision

v2
- add check for children and threads
- add network namespace container identifier list
- add NETFILTER_PKT audit container identifier logging
- patch description and documentation clean-up and example
- reap unused ppid

Richard Guy Briggs (13):
  audit: collect audit task parameters
  audit: add container id
  audit: read container ID of a process
  audit: log drop of contid on exit of last task
  audit: log container info of syscalls
  audit: add contid support for signalling the audit daemon
  audit: add support for non-syscall auxiliary records
  audit: add containerid support for user records
  audit: add containerid filtering
  audit: add support for containerid to network namespaces
  audit: contid check descendancy and nesting
  audit: track container nesting
  audit: add capcontid to set contid outside init_user_ns

 fs/proc/base.c              | 112 +++++++-
 include/linux/audit.h       | 135 +++++++++-
 include/linux/sched.h       |  10 +-
 include/uapi/linux/audit.h  |  10 +-
 init/init_task.c            |   3 +-
 init/main.c                 |   2 +
 kernel/audit.c              | 621 +++++++++++++++++++++++++++++++++++++++++++-
 kernel/audit.h              |  23 ++
 kernel/auditfilter.c        |  61 +++++
 kernel/auditsc.c            | 110 ++++++--
 kernel/fork.c               |   1 -
 kernel/nsproxy.c            |   4 +
 kernel/sched/core.c         |  33 +++
 net/netfilter/nft_log.c     |  11 +-
 net/netfilter/xt_AUDIT.c    |  11 +-
 security/selinux/nlmsgtab.c |   1 +
 security/yama/yama_lsm.c    |  33 ---
 17 files changed, 1085 insertions(+), 96 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 01/13] audit: collect audit task parameters
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:09   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 02/13] audit: add container id Richard Guy Briggs
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

The audit-related parameters in struct task_struct should ideally be
collected together and accessed through a standard audit API.

Collect the existing loginuid, sessionid and audit_context together in a
new struct audit_task_info called "audit" in struct task_struct.

Use kmem_cache to manage this pool of memory.
Un-inline audit_free() to be able to always recover that memory.

Please see the upstream github issue
https://github.com/linux-audit/audit-kernel/issues/81

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 include/linux/audit.h | 49 +++++++++++++++++++++++------------
 include/linux/sched.h |  7 +----
 init/init_task.c      |  3 +--
 init/main.c           |  2 ++
 kernel/audit.c        | 71 +++++++++++++++++++++++++++++++++++++++++++++++++--
 kernel/audit.h        |  5 ++++
 kernel/auditsc.c      | 26 ++++++++++---------
 kernel/fork.c         |  1 -
 8 files changed, 124 insertions(+), 40 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 3fcd9ee49734..c2150415f9df 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -100,6 +100,16 @@ enum audit_nfcfgop {
 	AUDIT_XT_OP_UNREGISTER,
 };
 
+struct audit_task_info {
+	kuid_t			loginuid;
+	unsigned int		sessionid;
+#ifdef CONFIG_AUDITSYSCALL
+	struct audit_context	*ctx;
+#endif
+};
+
+extern struct audit_task_info init_struct_audit;
+
 extern int is_audit_feature_set(int which);
 
 extern int __init audit_register_class(int class, unsigned *list);
@@ -136,6 +146,9 @@ enum audit_nfcfgop {
 #ifdef CONFIG_AUDIT
 /* These are defined in audit.c */
 				/* Public API */
+extern int  audit_alloc(struct task_struct *task);
+extern void audit_free(struct task_struct *task);
+extern void __init audit_task_init(void);
 extern __printf(4, 5)
 void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
 	       const char *fmt, ...);
@@ -179,12 +192,16 @@ extern void		    audit_log_path_denied(int type,
 
 static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
 {
-	return tsk->loginuid;
+	if (!tsk->audit)
+		return INVALID_UID;
+	return tsk->audit->loginuid;
 }
 
 static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 {
-	return tsk->sessionid;
+	if (!tsk->audit)
+		return AUDIT_SID_UNSET;
+	return tsk->audit->sessionid;
 }
 
 extern u32 audit_enabled;
@@ -192,6 +209,14 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 extern int audit_signal_info(int sig, struct task_struct *t);
 
 #else /* CONFIG_AUDIT */
+static inline int audit_alloc(struct task_struct *task)
+{
+	return 0;
+}
+static inline void audit_free(struct task_struct *task)
+{ }
+static inline void __init audit_task_init(void)
+{ }
 static inline __printf(4, 5)
 void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
 	       const char *fmt, ...)
@@ -267,8 +292,6 @@ static inline int audit_signal_info(int sig, struct task_struct *t)
 
 /* These are defined in auditsc.c */
 				/* Public API */
-extern int  audit_alloc(struct task_struct *task);
-extern void __audit_free(struct task_struct *task);
 extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
 				  unsigned long a2, unsigned long a3);
 extern void __audit_syscall_exit(int ret_success, long ret_value);
@@ -288,12 +311,14 @@ extern void audit_seccomp_actions_logged(const char *names,
 
 static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
 {
-	task->audit_context = ctx;
+	task->audit->ctx = ctx;
 }
 
 static inline struct audit_context *audit_context(void)
 {
-	return current->audit_context;
+	if (!current->audit)
+		return NULL;
+	return current->audit->ctx;
 }
 
 static inline bool audit_dummy_context(void)
@@ -301,11 +326,7 @@ static inline bool audit_dummy_context(void)
 	void *p = audit_context();
 	return !p || *(int *)p;
 }
-static inline void audit_free(struct task_struct *task)
-{
-	if (unlikely(task->audit_context))
-		__audit_free(task);
-}
+
 static inline void audit_syscall_entry(int major, unsigned long a0,
 				       unsigned long a1, unsigned long a2,
 				       unsigned long a3)
@@ -533,12 +554,6 @@ static inline void audit_log_nfcfg(const char *name, u8 af,
 extern int audit_n_rules;
 extern int audit_signals;
 #else /* CONFIG_AUDITSYSCALL */
-static inline int audit_alloc(struct task_struct *task)
-{
-	return 0;
-}
-static inline void audit_free(struct task_struct *task)
-{ }
 static inline void audit_syscall_entry(int major, unsigned long a0,
 				       unsigned long a1, unsigned long a2,
 				       unsigned long a3)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b62e6aaf28f0..2213ac670386 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -34,7 +34,6 @@
 #include <linux/kcsan.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
-struct audit_context;
 struct backing_dev_info;
 struct bio_list;
 struct blk_plug;
@@ -946,11 +945,7 @@ struct task_struct {
 	struct callback_head		*task_works;
 
 #ifdef CONFIG_AUDIT
-#ifdef CONFIG_AUDITSYSCALL
-	struct audit_context		*audit_context;
-#endif
-	kuid_t				loginuid;
-	unsigned int			sessionid;
+	struct audit_task_info		*audit;
 #endif
 	struct seccomp			seccomp;
 
diff --git a/init/init_task.c b/init/init_task.c
index 15089d15010a..92d34c4b7702 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -130,8 +130,7 @@ struct task_struct init_task
 	.thread_group	= LIST_HEAD_INIT(init_task.thread_group),
 	.thread_node	= LIST_HEAD_INIT(init_signals.thread_head),
 #ifdef CONFIG_AUDIT
-	.loginuid	= INVALID_UID,
-	.sessionid	= AUDIT_SID_UNSET,
+	.audit		= &init_struct_audit,
 #endif
 #ifdef CONFIG_PERF_EVENTS
 	.perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
diff --git a/init/main.c b/init/main.c
index 0ead83e86b5a..349470ad7458 100644
--- a/init/main.c
+++ b/init/main.c
@@ -96,6 +96,7 @@
 #include <linux/jump_label.h>
 #include <linux/mem_encrypt.h>
 #include <linux/kcsan.h>
+#include <linux/audit.h>
 
 #include <asm/io.h>
 #include <asm/bugs.h>
@@ -1028,6 +1029,7 @@ asmlinkage __visible void __init start_kernel(void)
 	nsfs_init();
 	cpuset_init();
 	cgroup_init();
+	audit_task_init();
 	taskstats_init_early();
 	delayacct_init();
 
diff --git a/kernel/audit.c b/kernel/audit.c
index 8c201f414226..5d8147a29291 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -203,6 +203,73 @@ struct audit_reply {
 	struct sk_buff *skb;
 };
 
+static struct kmem_cache *audit_task_cache;
+
+void __init audit_task_init(void)
+{
+	audit_task_cache = kmem_cache_create("audit_task",
+					     sizeof(struct audit_task_info),
+					     0, SLAB_PANIC, NULL);
+}
+
+/**
+ * audit_alloc - allocate an audit info block for a task
+ * @tsk: task
+ *
+ * Call audit_alloc_syscall to filter on the task information and
+ * allocate a per-task audit context if necessary.  This is called from
+ * copy_process, so no lock is needed.
+ */
+int audit_alloc(struct task_struct *tsk)
+{
+	int ret = 0;
+	struct audit_task_info *info;
+
+	info = kmem_cache_alloc(audit_task_cache, GFP_KERNEL);
+	if (!info) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	info->loginuid = audit_get_loginuid(current);
+	info->sessionid = audit_get_sessionid(current);
+	tsk->audit = info;
+
+	ret = audit_alloc_syscall(tsk);
+	if (ret) {
+		tsk->audit = NULL;
+		kmem_cache_free(audit_task_cache, info);
+	}
+out:
+	return ret;
+}
+
+struct audit_task_info init_struct_audit = {
+	.loginuid = INVALID_UID,
+	.sessionid = AUDIT_SID_UNSET,
+#ifdef CONFIG_AUDITSYSCALL
+	.ctx = NULL,
+#endif
+};
+
+/**
+ * audit_free - free per-task audit info
+ * @tsk: task whose audit info block to free
+ *
+ * Called from copy_process and do_exit
+ */
+void audit_free(struct task_struct *tsk)
+{
+	struct audit_task_info *info = tsk->audit;
+
+	audit_free_syscall(tsk);
+	/* Freeing the audit_task_info struct must be performed after
+	 * audit_log_exit() due to need for loginuid and sessionid.
+	 */
+	info = tsk->audit;
+	tsk->audit = NULL;
+	kmem_cache_free(audit_task_cache, info);
+}
+
 /**
  * auditd_test_task - Check to see if a given task is an audit daemon
  * @task: the task to check
@@ -2309,8 +2376,8 @@ int audit_set_loginuid(kuid_t loginuid)
 			sessionid = (unsigned int)atomic_inc_return(&session_id);
 	}
 
-	current->sessionid = sessionid;
-	current->loginuid = loginuid;
+	current->audit->sessionid = sessionid;
+	current->audit->loginuid = loginuid;
 out:
 	audit_log_set_loginuid(oldloginuid, loginuid, oldsessionid, sessionid, rc);
 	return rc;
diff --git a/kernel/audit.h b/kernel/audit.h
index f0233dc40b17..9bee09757068 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -251,6 +251,8 @@ extern void audit_log_d_path_exe(struct audit_buffer *ab,
 extern unsigned int audit_serial(void);
 extern int auditsc_get_stamp(struct audit_context *ctx,
 			      struct timespec64 *t, unsigned int *serial);
+extern int audit_alloc_syscall(struct task_struct *tsk);
+extern void audit_free_syscall(struct task_struct *tsk);
 
 extern void audit_put_watch(struct audit_watch *watch);
 extern void audit_get_watch(struct audit_watch *watch);
@@ -299,6 +301,9 @@ static inline void audit_clear_dummy(struct audit_context *ctx)
 
 #else /* CONFIG_AUDITSYSCALL */
 #define auditsc_get_stamp(c, t, s) 0
+#define audit_alloc_syscall(t) 0
+#define audit_free_syscall(t) {}
+
 #define audit_put_watch(w) {}
 #define audit_get_watch(w) {}
 #define audit_to_watch(k, p, l, o) (-EINVAL)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 468a23390457..f00c1da587ea 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -914,23 +914,25 @@ static inline struct audit_context *audit_alloc_context(enum audit_state state)
 	return context;
 }
 
-/**
- * audit_alloc - allocate an audit context block for a task
+/*
+ * audit_alloc_syscall - allocate an audit context block for a task
  * @tsk: task
  *
  * Filter on the task information and allocate a per-task audit context
  * if necessary.  Doing so turns on system call auditing for the
- * specified task.  This is called from copy_process, so no lock is
- * needed.
+ * specified task.  This is called from copy_process via audit_alloc, so
+ * no lock is needed.
  */
-int audit_alloc(struct task_struct *tsk)
+int audit_alloc_syscall(struct task_struct *tsk)
 {
 	struct audit_context *context;
 	enum audit_state     state;
 	char *key = NULL;
 
-	if (likely(!audit_ever_enabled))
+	if (likely(!audit_ever_enabled)) {
+		audit_set_context(tsk, NULL);
 		return 0; /* Return if not auditing. */
+	}
 
 	state = audit_filter_task(tsk, &key);
 	if (state == AUDIT_DISABLED) {
@@ -940,7 +942,7 @@ int audit_alloc(struct task_struct *tsk)
 
 	if (!(context = audit_alloc_context(state))) {
 		kfree(key);
-		audit_log_lost("out of memory in audit_alloc");
+		audit_log_lost("out of memory in audit_alloc_syscall");
 		return -ENOMEM;
 	}
 	context->filterkey = key;
@@ -1582,14 +1584,15 @@ static void audit_log_exit(void)
 }
 
 /**
- * __audit_free - free a per-task audit context
+ * audit_free_syscall - free per-task audit context info
  * @tsk: task whose audit context block to free
  *
- * Called from copy_process and do_exit
+ * Called from audit_free
  */
-void __audit_free(struct task_struct *tsk)
+void audit_free_syscall(struct task_struct *tsk)
 {
-	struct audit_context *context = tsk->audit_context;
+	struct audit_task_info *info = tsk->audit;
+	struct audit_context *context = info->ctx;
 
 	if (!context)
 		return;
@@ -1612,7 +1615,6 @@ void __audit_free(struct task_struct *tsk)
 		if (context->current_state == AUDIT_RECORD_CONTEXT)
 			audit_log_exit();
 	}
-
 	audit_set_context(tsk, NULL);
 	audit_free_context(context);
 }
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..bacbff0e8a75 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2019,7 +2019,6 @@ static __latent_entropy struct task_struct *copy_process(
 	posix_cputimers_init(&p->posix_cputimers);
 
 	p->io_context = NULL;
-	audit_set_context(p, NULL);
 	cgroup_fork(p);
 #ifdef CONFIG_NUMA
 	p->mempolicy = mpol_dup(p->mempolicy);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 02/13] audit: add container id
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
  2020-06-27 13:20 ` [PATCH ghak90 V9 01/13] audit: collect audit task parameters Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-04 13:29   ` Paul Moore
  2020-07-05 15:09   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 03/13] audit: read container ID of a process Richard Guy Briggs
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Implement the proc fs write to set the audit container identifier of a
process, emitting an AUDIT_CONTAINER_OP record to document the event.

This is a write from the container orchestrator task to a proc entry of
the form /proc/PID/audit_containerid where PID is the process ID of the
newly created task that is to become the first task in a container, or
an additional task added to a container.

The write expects up to a u64 value (unset: 18446744073709551615).

The writer must have capability CAP_AUDIT_CONTROL.

This will produce a record such as this:
  type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615

The "op" field indicates an initial set.  The "opid" field is the
object's PID, the process being "contained".  New and old audit
container identifier values are given in the "contid" fields.

It is not permitted to unset the audit container identifier.
A child inherits its parent's audit container identifier.

Store the audit container identifier in a refcounted kernel object that
is added to the master list of audit container identifiers.  This will
allow multiple container orchestrators/engines to work on the same
machine without danger of inadvertantly re-using an existing identifier.
It will also allow an orchestrator to inject a process into an existing
container by checking if the original container owner is the one
injecting the task.  A hash table list is used to optimize searches.

Please see the github audit kernel issue for the main feature:
  https://github.com/linux-audit/audit-kernel/issues/90
Please see the github audit userspace issue for supporting additions:
  https://github.com/linux-audit/audit-userspace/issues/51
Please see the github audit testsuiite issue for the test case:
  https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
  https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 fs/proc/base.c             |  36 +++++++++++
 include/linux/audit.h      |  33 ++++++++++
 include/uapi/linux/audit.h |   2 +
 kernel/audit.c             | 148 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/audit.h             |   8 +++
 5 files changed, 227 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index d86c0afc8a85..6c17ab32e71b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1317,6 +1317,40 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
 	.read		= proc_sessionid_read,
 	.llseek		= generic_file_llseek,
 };
+
+static ssize_t proc_contid_write(struct file *file, const char __user *buf,
+				   size_t count, loff_t *ppos)
+{
+	struct inode *inode = file_inode(file);
+	u64 contid;
+	int rv;
+	struct task_struct *task = get_proc_task(inode);
+
+	if (!task)
+		return -ESRCH;
+	if (*ppos != 0) {
+		/* No partial writes. */
+		put_task_struct(task);
+		return -EINVAL;
+	}
+
+	rv = kstrtou64_from_user(buf, count, 10, &contid);
+	if (rv < 0) {
+		put_task_struct(task);
+		return rv;
+	}
+
+	rv = audit_set_contid(task, contid);
+	put_task_struct(task);
+	if (rv < 0)
+		return rv;
+	return count;
+}
+
+static const struct file_operations proc_contid_operations = {
+	.write		= proc_contid_write,
+	.llseek		= generic_file_llseek,
+};
 #endif
 
 #ifdef CONFIG_FAULT_INJECTION
@@ -3219,6 +3253,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
 #ifdef CONFIG_AUDIT
 	REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUGO, proc_sessionid_operations),
+	REG("audit_containerid", S_IWUSR, proc_contid_operations),
 #endif
 #ifdef CONFIG_FAULT_INJECTION
 	REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
@@ -3558,6 +3593,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
 #ifdef CONFIG_AUDIT
 	REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUGO, proc_sessionid_operations),
+	REG("audit_containerid", S_IWUSR, proc_contid_operations),
 #endif
 #ifdef CONFIG_FAULT_INJECTION
 	REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
diff --git a/include/linux/audit.h b/include/linux/audit.h
index c2150415f9df..2800d4f1a2a8 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -100,9 +100,18 @@ enum audit_nfcfgop {
 	AUDIT_XT_OP_UNREGISTER,
 };
 
+struct audit_contobj {
+	struct list_head	list;
+	u64			id;
+	struct task_struct	*owner;
+	refcount_t		refcount;
+	struct rcu_head         rcu;
+};
+
 struct audit_task_info {
 	kuid_t			loginuid;
 	unsigned int		sessionid;
+	struct audit_contobj	*cont;
 #ifdef CONFIG_AUDITSYSCALL
 	struct audit_context	*ctx;
 #endif
@@ -204,6 +213,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 	return tsk->audit->sessionid;
 }
 
+extern int audit_set_contid(struct task_struct *tsk, u64 contid);
+
+static inline u64 audit_get_contid(struct task_struct *tsk)
+{
+	if (!tsk->audit || !tsk->audit->cont)
+		return AUDIT_CID_UNSET;
+	return tsk->audit->cont->id;
+}
+
 extern u32 audit_enabled;
 
 extern int audit_signal_info(int sig, struct task_struct *t);
@@ -268,6 +286,11 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 	return AUDIT_SID_UNSET;
 }
 
+static inline u64 audit_get_contid(struct task_struct *tsk)
+{
+	return AUDIT_CID_UNSET;
+}
+
 #define audit_enabled AUDIT_OFF
 
 static inline int audit_signal_info(int sig, struct task_struct *t)
@@ -692,6 +715,16 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
 	return uid_valid(audit_get_loginuid(tsk));
 }
 
+static inline bool audit_contid_valid(u64 contid)
+{
+	return contid != AUDIT_CID_UNSET;
+}
+
+static inline bool audit_contid_set(struct task_struct *tsk)
+{
+	return audit_contid_valid(audit_get_contid(tsk));
+}
+
 static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
 {
 	audit_log_n_string(ab, buf, strlen(buf));
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 9b6a973f4cc3..859382527210 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -71,6 +71,7 @@
 #define AUDIT_TTY_SET		1017	/* Set TTY auditing status */
 #define AUDIT_SET_FEATURE	1018	/* Turn an audit feature on or off */
 #define AUDIT_GET_FEATURE	1019	/* Get which features are enabled */
+#define AUDIT_CONTAINER_OP	1020	/* Define the container id and info */
 
 #define AUDIT_FIRST_USER_MSG	1100	/* Userspace messages mostly uninteresting to kernel */
 #define AUDIT_USER_AVC		1107	/* We filter this differently */
@@ -491,6 +492,7 @@ struct audit_tty_status {
 
 #define AUDIT_UID_UNSET (unsigned int)-1
 #define AUDIT_SID_UNSET ((unsigned int)-1)
+#define AUDIT_CID_UNSET ((u64)-1)
 
 /* audit_rule_data supports filter rules with both integer and string
  * fields.  It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
diff --git a/kernel/audit.c b/kernel/audit.c
index 5d8147a29291..6d387793f702 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -138,6 +138,13 @@ struct auditd_connection {
 
 /* Hash for inode-based rules */
 struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
+/* Hash for contid object lists */
+struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
+/* Lock all additions and deletions to the contid hash lists, assignment
+ * of container objects to tasks.  There should be no need for
+ * interaction with tasklist_lock
+ */
+static DEFINE_SPINLOCK(audit_contobj_list_lock);
 
 static struct kmem_cache *audit_buffer_cache;
 
@@ -212,6 +219,33 @@ void __init audit_task_init(void)
 					     0, SLAB_PANIC, NULL);
 }
 
+/* rcu_read_lock must be held by caller unless new */
+static struct audit_contobj *_audit_contobj_hold(struct audit_contobj *cont)
+{
+	if (cont)
+		refcount_inc(&cont->refcount);
+	return cont;
+}
+
+static struct audit_contobj *_audit_contobj_get(struct task_struct *tsk)
+{
+	if (!tsk->audit)
+		return NULL;
+	return _audit_contobj_hold(tsk->audit->cont);
+}
+
+/* rcu_read_lock must be held by caller */
+static void _audit_contobj_put(struct audit_contobj *cont)
+{
+	if (!cont)
+		return;
+	if (refcount_dec_and_test(&cont->refcount)) {
+		put_task_struct(cont->owner);
+		list_del_rcu(&cont->list);
+		kfree_rcu(cont, rcu);
+	}
+}
+
 /**
  * audit_alloc - allocate an audit info block for a task
  * @tsk: task
@@ -232,6 +266,9 @@ int audit_alloc(struct task_struct *tsk)
 	}
 	info->loginuid = audit_get_loginuid(current);
 	info->sessionid = audit_get_sessionid(current);
+	rcu_read_lock();
+	info->cont = _audit_contobj_get(current);
+	rcu_read_unlock();
 	tsk->audit = info;
 
 	ret = audit_alloc_syscall(tsk);
@@ -246,6 +283,7 @@ int audit_alloc(struct task_struct *tsk)
 struct audit_task_info init_struct_audit = {
 	.loginuid = INVALID_UID,
 	.sessionid = AUDIT_SID_UNSET,
+	.cont = NULL,
 #ifdef CONFIG_AUDITSYSCALL
 	.ctx = NULL,
 #endif
@@ -262,6 +300,9 @@ void audit_free(struct task_struct *tsk)
 	struct audit_task_info *info = tsk->audit;
 
 	audit_free_syscall(tsk);
+	rcu_read_lock();
+	_audit_contobj_put(tsk->audit->cont);
+	rcu_read_unlock();
 	/* Freeing the audit_task_info struct must be performed after
 	 * audit_log_exit() due to need for loginuid and sessionid.
 	 */
@@ -1709,6 +1750,9 @@ static int __init audit_init(void)
 	for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
 		INIT_LIST_HEAD(&audit_inode_hash[i]);
 
+	for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
+		INIT_LIST_HEAD(&audit_contid_hash[i]);
+
 	mutex_init(&audit_cmd_mutex.lock);
 	audit_cmd_mutex.owner = NULL;
 
@@ -2410,6 +2454,110 @@ int audit_signal_info(int sig, struct task_struct *t)
 	return audit_signal_info_syscall(t);
 }
 
+/*
+ * audit_set_contid - set current task's audit contid
+ * @task: target task
+ * @contid: contid value
+ *
+ * Returns 0 on success, -EPERM on permission failure.
+ *
+ * If the original container owner goes away, no task injection is
+ * possible to an existing container.
+ *
+ * Called (set) from fs/proc/base.c::proc_contid_write().
+ */
+int audit_set_contid(struct task_struct *task, u64 contid)
+{
+	int rc = 0;
+	struct audit_buffer *ab;
+	struct audit_contobj *oldcont = NULL;
+
+	task_lock(task);
+	/* Can't set if audit disabled */
+	if (!task->audit) {
+		task_unlock(task);
+		return -ENOPROTOOPT;
+	}
+	read_lock(&tasklist_lock);
+	/* Don't allow the contid to be unset */
+	if (!audit_contid_valid(contid)) {
+		rc = -EINVAL;
+		goto unlock;
+	}
+	/* if we don't have caps, reject */
+	if (!capable(CAP_AUDIT_CONTROL)) {
+		rc = -EPERM;
+		goto unlock;
+	}
+	/* if task has children or is not single-threaded, deny */
+	if (!list_empty(&task->children) ||
+	    !(thread_group_leader(task) && thread_group_empty(task))) {
+		rc = -EBUSY;
+		goto unlock;
+	}
+	/* if contid is already set, deny */
+	if (audit_contid_set(task))
+		rc = -EEXIST;
+unlock:
+	read_unlock(&tasklist_lock);
+	rcu_read_lock();
+	oldcont = _audit_contobj_get(task);
+	if (!rc) {
+		struct audit_contobj *cont = NULL, *newcont = NULL;
+		int h = audit_hash_contid(contid);
+
+		spin_lock(&audit_contobj_list_lock);
+		list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
+			if (cont->id == contid) {
+				/* task injection to existing container */
+				if (current == cont->owner) {
+					_audit_contobj_hold(cont);
+					newcont = cont;
+				} else {
+					rc = -ENOTUNIQ;
+					spin_unlock(&audit_contobj_list_lock);
+					goto conterror;
+				}
+				break;
+			}
+		if (!newcont) {
+			newcont = kmalloc(sizeof(*newcont), GFP_ATOMIC);
+			if (newcont) {
+				INIT_LIST_HEAD(&newcont->list);
+				newcont->id = contid;
+				newcont->owner = get_task_struct(current);
+				refcount_set(&newcont->refcount, 1);
+				list_add_rcu(&newcont->list,
+					     &audit_contid_hash[h]);
+			} else {
+				rc = -ENOMEM;
+				spin_unlock(&audit_contobj_list_lock);
+				goto conterror;
+			}
+		}
+		spin_unlock(&audit_contobj_list_lock);
+		task->audit->cont = newcont;
+		_audit_contobj_put(oldcont);
+	}
+conterror:
+	task_unlock(task);
+
+	if (!audit_enabled)
+		return rc;
+
+	ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
+	if (!ab)
+		return rc;
+
+	audit_log_format(ab,
+			 "op=set opid=%d contid=%llu old-contid=%llu",
+			 task_tgid_nr(task), contid, oldcont ? oldcont->id : -1);
+	_audit_contobj_put(oldcont);
+	rcu_read_unlock();
+	audit_log_end(ab);
+	return rc;
+}
+
 /**
  * audit_log_end - end one audit record
  * @ab: the audit_buffer
diff --git a/kernel/audit.h b/kernel/audit.h
index 9bee09757068..182fc76ea276 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -210,6 +210,14 @@ static inline int audit_hash_ino(u32 ino)
 	return (ino & (AUDIT_INODE_BUCKETS-1));
 }
 
+#define AUDIT_CONTID_BUCKETS	32
+extern struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
+
+static inline int audit_hash_contid(u64 contid)
+{
+	return (contid & (AUDIT_CONTID_BUCKETS-1));
+}
+
 /* Indicates that audit should log the full pathname. */
 #define AUDIT_NAME_FULL -1
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 03/13] audit: read container ID of a process
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
  2020-06-27 13:20 ` [PATCH ghak90 V9 01/13] audit: collect audit task parameters Richard Guy Briggs
  2020-06-27 13:20 ` [PATCH ghak90 V9 02/13] audit: add container id Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-06-27 13:20 ` [PATCH ghak90 V9 04/13] audit: log drop of contid on exit of last task Richard Guy Briggs
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Add support for reading the audit container identifier from the proc
filesystem.

This is a read from the proc entry of the form
/proc/PID/audit_containerid where PID is the process ID of the task
whose audit container identifier is sought.

The read expects up to a u64 value (unset: 18446744073709551615).

This read requires CAP_AUDIT_CONTROL.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 fs/proc/base.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6c17ab32e71b..794474cd8f35 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1234,7 +1234,7 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
 };
 
 #ifdef CONFIG_AUDIT
-#define TMPBUFLEN 11
+#define TMPBUFLEN 21
 static ssize_t proc_loginuid_read(struct file * file, char __user * buf,
 				  size_t count, loff_t *ppos)
 {
@@ -1318,6 +1318,24 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
 	.llseek		= generic_file_llseek,
 };
 
+static ssize_t proc_contid_read(struct file *file, char __user *buf,
+				  size_t count, loff_t *ppos)
+{
+	struct inode *inode = file_inode(file);
+	struct task_struct *task = get_proc_task(inode);
+	ssize_t length;
+	char tmpbuf[TMPBUFLEN];
+
+	if (!task)
+		return -ESRCH;
+	/* if we don't have caps, reject */
+	if (!capable(CAP_AUDIT_CONTROL))
+		return -EPERM;
+	length = scnprintf(tmpbuf, TMPBUFLEN, "%llu", audit_get_contid(task));
+	put_task_struct(task);
+	return simple_read_from_buffer(buf, count, ppos, tmpbuf, length);
+}
+
 static ssize_t proc_contid_write(struct file *file, const char __user *buf,
 				   size_t count, loff_t *ppos)
 {
@@ -1348,6 +1366,7 @@ static ssize_t proc_contid_write(struct file *file, const char __user *buf,
 }
 
 static const struct file_operations proc_contid_operations = {
+	.read		= proc_contid_read,
 	.write		= proc_contid_write,
 	.llseek		= generic_file_llseek,
 };
@@ -3253,7 +3272,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
 #ifdef CONFIG_AUDIT
 	REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUGO, proc_sessionid_operations),
-	REG("audit_containerid", S_IWUSR, proc_contid_operations),
+	REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
 #endif
 #ifdef CONFIG_FAULT_INJECTION
 	REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
@@ -3593,7 +3612,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
 #ifdef CONFIG_AUDIT
 	REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUGO, proc_sessionid_operations),
-	REG("audit_containerid", S_IWUSR, proc_contid_operations),
+	REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
 #endif
 #ifdef CONFIG_FAULT_INJECTION
 	REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 04/13] audit: log drop of contid on exit of last task
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (2 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 03/13] audit: read container ID of a process Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:10   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 05/13] audit: log container info of syscalls Richard Guy Briggs
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Since we are tracking the life of each audit container indentifier, we
can match the creation event with the destruction event.  Log the
destruction of the audit container identifier when the last process in
that container exits.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 kernel/audit.c   | 20 ++++++++++++++++++++
 kernel/audit.h   |  2 ++
 kernel/auditsc.c |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/kernel/audit.c b/kernel/audit.c
index 6d387793f702..9e0b38ce1ead 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2558,6 +2558,26 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 	return rc;
 }
 
+void audit_log_container_drop(void)
+{
+	struct audit_buffer *ab;
+	struct audit_contobj *cont;
+
+	rcu_read_lock();
+	cont = _audit_contobj_get(current);
+	_audit_contobj_put(cont);
+	if (!cont || refcount_read(&cont->refcount) > 1)
+		goto out;
+	ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
+	if (!ab)
+		goto out;
+	audit_log_format(ab, "op=drop opid=%d contid=%llu old-contid=%llu",
+			 task_tgid_nr(current), cont->id, cont->id);
+	audit_log_end(ab);
+out:
+	rcu_read_unlock();
+}
+
 /**
  * audit_log_end - end one audit record
  * @ab: the audit_buffer
diff --git a/kernel/audit.h b/kernel/audit.h
index 182fc76ea276..d07093903008 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -254,6 +254,8 @@ extern void audit_log_d_path_exe(struct audit_buffer *ab,
 extern struct tty_struct *audit_get_tty(void);
 extern void audit_put_tty(struct tty_struct *tty);
 
+extern void audit_log_container_drop(void);
+
 /* audit watch/mark/tree functions */
 #ifdef CONFIG_AUDITSYSCALL
 extern unsigned int audit_serial(void);
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index f00c1da587ea..f03d3eb0752c 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -1575,6 +1575,8 @@ static void audit_log_exit(void)
 
 	audit_log_proctitle();
 
+	audit_log_container_drop();
+
 	/* Send end of event record to help user space know we are finished */
 	ab = audit_log_start(context, GFP_KERNEL, AUDIT_EOE);
 	if (ab)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 05/13] audit: log container info of syscalls
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (3 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 04/13] audit: log drop of contid on exit of last task Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:10   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 06/13] audit: add contid support for signalling the audit daemon Richard Guy Briggs
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Create a new audit record AUDIT_CONTAINER_ID to document the audit
container identifier of a process if it is present.

Called from audit_log_exit(), syscalls are covered.

Include target_cid references from ptrace and signal.

A sample raw event:
type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
type=CWD msg=audit(1519924845.499:257): cwd="/root"
type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458

Please see the github audit kernel issue for the main feature:
  https://github.com/linux-audit/audit-kernel/issues/90
Please see the github audit userspace issue for supporting additions:
  https://github.com/linux-audit/audit-userspace/issues/51
Please see the github audit testsuiite issue for the test case:
  https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
  https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Steve Grubb <sgrubb@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 include/linux/audit.h      |  7 +++++++
 include/uapi/linux/audit.h |  1 +
 kernel/audit.c             | 25 +++++++++++++++++++++++--
 kernel/audit.h             |  4 ++++
 kernel/auditsc.c           | 45 +++++++++++++++++++++++++++++++++++++++------
 5 files changed, 74 insertions(+), 8 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 2800d4f1a2a8..5eeba0efffc2 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -222,6 +222,9 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
 	return tsk->audit->cont->id;
 }
 
+extern void audit_log_container_id(struct audit_context *context,
+				   struct audit_contobj *cont);
+
 extern u32 audit_enabled;
 
 extern int audit_signal_info(int sig, struct task_struct *t);
@@ -291,6 +294,10 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
 	return AUDIT_CID_UNSET;
 }
 
+static inline void audit_log_container_id(struct audit_context *context,
+					  struct audit_contobj *cont)
+{ }
+
 #define audit_enabled AUDIT_OFF
 
 static inline int audit_signal_info(int sig, struct task_struct *t)
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 859382527210..fd98460c983f 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -119,6 +119,7 @@
 #define AUDIT_TIME_ADJNTPVAL	1333	/* NTP value adjustment */
 #define AUDIT_BPF		1334	/* BPF subsystem */
 #define AUDIT_EVENT_LISTENER	1335	/* Task joined multicast read socket */
+#define AUDIT_CONTAINER_ID	1336	/* Container ID */
 
 #define AUDIT_AVC		1400	/* SE Linux avc denial or grant */
 #define AUDIT_SELINUX_ERR	1401	/* Internal SE Linux Errors */
diff --git a/kernel/audit.c b/kernel/audit.c
index 9e0b38ce1ead..a09f8f661234 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -227,7 +227,7 @@ static struct audit_contobj *_audit_contobj_hold(struct audit_contobj *cont)
 	return cont;
 }
 
-static struct audit_contobj *_audit_contobj_get(struct task_struct *tsk)
+struct audit_contobj *_audit_contobj_get(struct task_struct *tsk)
 {
 	if (!tsk->audit)
 		return NULL;
@@ -235,7 +235,7 @@ static struct audit_contobj *_audit_contobj_get(struct task_struct *tsk)
 }
 
 /* rcu_read_lock must be held by caller */
-static void _audit_contobj_put(struct audit_contobj *cont)
+void _audit_contobj_put(struct audit_contobj *cont)
 {
 	if (!cont)
 		return;
@@ -2211,6 +2211,27 @@ void audit_log_session_info(struct audit_buffer *ab)
 	audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
 }
 
+/*
+ * audit_log_container_id - report container info
+ * @context: task or local context for record
+ * @cont: container object to report
+ */
+void audit_log_container_id(struct audit_context *context,
+			    struct audit_contobj *cont)
+{
+	struct audit_buffer *ab;
+
+	if (!cont)
+		return;
+	/* Generate AUDIT_CONTAINER_ID record with container ID */
+	ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
+	if (!ab)
+		return;
+	audit_log_format(ab, "contid=%llu", contid);
+	audit_log_end(ab);
+}
+EXPORT_SYMBOL(audit_log_container_id);
+
 void audit_log_key(struct audit_buffer *ab, char *key)
 {
 	audit_log_format(ab, " key=");
diff --git a/kernel/audit.h b/kernel/audit.h
index d07093903008..0c9446f8d52c 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -135,6 +135,7 @@ struct audit_context {
 	kuid_t		    target_uid;
 	unsigned int	    target_sessionid;
 	u32		    target_sid;
+	struct audit_contobj *target_cid;
 	char		    target_comm[TASK_COMM_LEN];
 
 	struct audit_tree_refs *trees, *first_trees;
@@ -218,6 +219,9 @@ static inline int audit_hash_contid(u64 contid)
 	return (contid & (AUDIT_CONTID_BUCKETS-1));
 }
 
+extern struct audit_contobj *_audit_contobj_get(struct task_struct *tsk);
+extern void _audit_contobj_put(struct audit_contobj *cont);
+
 /* Indicates that audit should log the full pathname. */
 #define AUDIT_NAME_FULL -1
 
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index f03d3eb0752c..9e79645e5c0e 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -113,6 +113,7 @@ struct audit_aux_data_pids {
 	kuid_t			target_uid[AUDIT_AUX_PIDS];
 	unsigned int		target_sessionid[AUDIT_AUX_PIDS];
 	u32			target_sid[AUDIT_AUX_PIDS];
+	struct audit_contobj	*target_cid[AUDIT_AUX_PIDS];
 	char 			target_comm[AUDIT_AUX_PIDS][TASK_COMM_LEN];
 	int			pid_count;
 };
@@ -889,13 +890,20 @@ static inline void audit_free_names(struct audit_context *context)
 static inline void audit_free_aux(struct audit_context *context)
 {
 	struct audit_aux_data *aux;
+	struct audit_aux_data_pids *axp;
 
+	_audit_contobj_put(context->target_cid);
 	while ((aux = context->aux)) {
 		context->aux = aux->next;
 		kfree(aux);
 	}
 	while ((aux = context->aux_pids)) {
+		int i;
+
 		context->aux_pids = aux->next;
+		axp = (struct audit_aux_data_pids *)aux;
+		for (i = 0; i < axp->pid_count; i++)
+			_audit_contobj_put(axp->target_cid[i]);
 		kfree(aux);
 	}
 }
@@ -1458,6 +1466,7 @@ static void audit_log_exit(void)
 	struct audit_buffer *ab;
 	struct audit_aux_data *aux;
 	struct audit_names *n;
+	struct audit_contobj *cont;
 
 	context->personality = current->personality;
 
@@ -1541,7 +1550,7 @@ static void audit_log_exit(void)
 	for (aux = context->aux_pids; aux; aux = aux->next) {
 		struct audit_aux_data_pids *axs = (void *)aux;
 
-		for (i = 0; i < axs->pid_count; i++)
+		for (i = 0; i < axs->pid_count; i++) {
 			if (audit_log_pid_context(context, axs->target_pid[i],
 						  axs->target_auid[i],
 						  axs->target_uid[i],
@@ -1549,14 +1558,20 @@ static void audit_log_exit(void)
 						  axs->target_sid[i],
 						  axs->target_comm[i]))
 				call_panic = 1;
+			audit_log_container_id(context, axs->target_cid[i]);
+		}
 	}
 
-	if (context->target_pid &&
-	    audit_log_pid_context(context, context->target_pid,
-				  context->target_auid, context->target_uid,
-				  context->target_sessionid,
-				  context->target_sid, context->target_comm))
+	if (context->target_pid) {
+		if (audit_log_pid_context(context, context->target_pid,
+					  context->target_auid,
+					  context->target_uid,
+					  context->target_sessionid,
+					  context->target_sid,
+					  context->target_comm))
 			call_panic = 1;
+		audit_log_container_id(context, context->target_cid);
+	}
 
 	if (context->pwd.dentry && context->pwd.mnt) {
 		ab = audit_log_start(context, GFP_KERNEL, AUDIT_CWD);
@@ -1575,6 +1590,14 @@ static void audit_log_exit(void)
 
 	audit_log_proctitle();
 
+	rcu_read_lock();
+	cont = _audit_contobj_get(current);
+	rcu_read_unlock();
+	audit_log_container_id(context, cont);
+	rcu_read_lock();
+	_audit_contobj_put(cont);
+	rcu_read_unlock();
+
 	audit_log_container_drop();
 
 	/* Send end of event record to help user space know we are finished */
@@ -2385,6 +2408,10 @@ void __audit_ptrace(struct task_struct *t)
 	context->target_uid = task_uid(t);
 	context->target_sessionid = audit_get_sessionid(t);
 	security_task_getsecid(t, &context->target_sid);
+	rcu_read_lock();
+	_audit_contobj_put(context->target_cid);
+	context->target_cid = _audit_contobj_get(t);
+	rcu_read_unlock();
 	memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
 }
 
@@ -2412,6 +2439,9 @@ int audit_signal_info_syscall(struct task_struct *t)
 		ctx->target_uid = t_uid;
 		ctx->target_sessionid = audit_get_sessionid(t);
 		security_task_getsecid(t, &ctx->target_sid);
+		rcu_read_lock();
+		ctx->target_cid = _audit_contobj_get(t);
+		rcu_read_unlock();
 		memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN);
 		return 0;
 	}
@@ -2433,6 +2463,9 @@ int audit_signal_info_syscall(struct task_struct *t)
 	axp->target_uid[axp->pid_count] = t_uid;
 	axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
 	security_task_getsecid(t, &axp->target_sid[axp->pid_count]);
+	rcu_read_lock();
+	axp->target_cid[axp->pid_count] = _audit_contobj_get(t);
+	rcu_read_unlock();
 	memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN);
 	axp->pid_count++;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 06/13] audit: add contid support for signalling the audit daemon
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (4 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 05/13] audit: log container info of syscalls Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:10   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 07/13] audit: add support for non-syscall auxiliary records Richard Guy Briggs
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Add audit container identifier support to the action of signalling the
audit daemon.

Since this would need to add an element to the audit_sig_info struct,
a new record type AUDIT_SIGNAL_INFO2 was created with a new
audit_sig_info2 struct.  Corresponding support is required in the
userspace code to reflect the new record request and reply type.
An older userspace won't break since it won't know to request this
record type.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 include/linux/audit.h       |  8 ++++
 include/uapi/linux/audit.h  |  1 +
 kernel/audit.c              | 95 ++++++++++++++++++++++++++++++++++++++++++++-
 security/selinux/nlmsgtab.c |  1 +
 4 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 5eeba0efffc2..89cf7c66abe6 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -22,6 +22,13 @@ struct audit_sig_info {
 	char		ctx[];
 };
 
+struct audit_sig_info2 {
+	uid_t		uid;
+	pid_t		pid;
+	u32		cid_len;
+	char		data[];
+};
+
 struct audit_buffer;
 struct audit_context;
 struct inode;
@@ -105,6 +112,7 @@ struct audit_contobj {
 	u64			id;
 	struct task_struct	*owner;
 	refcount_t		refcount;
+	refcount_t		sigflag;
 	struct rcu_head         rcu;
 };
 
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index fd98460c983f..a56ad77069b9 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -72,6 +72,7 @@
 #define AUDIT_SET_FEATURE	1018	/* Turn an audit feature on or off */
 #define AUDIT_GET_FEATURE	1019	/* Get which features are enabled */
 #define AUDIT_CONTAINER_OP	1020	/* Define the container id and info */
+#define AUDIT_SIGNAL_INFO2	1021	/* Get info auditd signal sender */
 
 #define AUDIT_FIRST_USER_MSG	1100	/* Userspace messages mostly uninteresting to kernel */
 #define AUDIT_USER_AVC		1107	/* We filter this differently */
diff --git a/kernel/audit.c b/kernel/audit.c
index a09f8f661234..54dd2cb69402 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -126,6 +126,8 @@ struct auditd_connection {
 kuid_t		audit_sig_uid = INVALID_UID;
 pid_t		audit_sig_pid = -1;
 u32		audit_sig_sid = 0;
+static struct audit_contobj *audit_sig_cid;
+static struct task_struct *audit_sig_atsk;
 
 /* Records can be lost in several ways:
    0) [suppressed in audit_alloc]
@@ -239,7 +241,33 @@ void _audit_contobj_put(struct audit_contobj *cont)
 {
 	if (!cont)
 		return;
-	if (refcount_dec_and_test(&cont->refcount)) {
+	if (refcount_dec_and_test(&cont->refcount) && !refcount_read(&cont->sigflag)) {
+		put_task_struct(cont->owner);
+		list_del_rcu(&cont->list);
+		kfree_rcu(cont, rcu);
+	}
+}
+
+/* rcu_read_lock must be held by caller unless new */
+static struct audit_contobj *_audit_contobj_get_sig(struct task_struct *tsk)
+{
+	struct audit_contobj *cont;
+
+	if (!tsk->audit)
+		return NULL;
+	cont = tsk->audit->cont;
+	if (cont)
+		refcount_set(&cont->sigflag, 1);
+	return cont;
+}
+
+/* rcu_read_lock must be held by caller */
+static void _audit_contobj_put_sig(struct audit_contobj *cont)
+{
+	if (!cont)
+		return;
+	refcount_set(&cont->sigflag, 0);
+	if (!refcount_read(&cont->refcount)) {
 		put_task_struct(cont->owner);
 		list_del_rcu(&cont->list);
 		kfree_rcu(cont, rcu);
@@ -309,6 +337,13 @@ void audit_free(struct task_struct *tsk)
 	info = tsk->audit;
 	tsk->audit = NULL;
 	kmem_cache_free(audit_task_cache, info);
+	rcu_read_lock();
+	if (audit_sig_atsk == tsk) {
+		_audit_contobj_put_sig(audit_sig_cid);
+		audit_sig_cid = NULL;
+		audit_sig_atsk = NULL;
+	}
+	rcu_read_unlock();
 }
 
 /**
@@ -1132,6 +1167,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
 	case AUDIT_ADD_RULE:
 	case AUDIT_DEL_RULE:
 	case AUDIT_SIGNAL_INFO:
+	case AUDIT_SIGNAL_INFO2:
 	case AUDIT_TTY_GET:
 	case AUDIT_TTY_SET:
 	case AUDIT_TRIM:
@@ -1294,6 +1330,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct audit_buffer	*ab;
 	u16			msg_type = nlh->nlmsg_type;
 	struct audit_sig_info   *sig_data;
+	struct audit_sig_info2  *sig_data2;
 	char			*ctx = NULL;
 	u32			len;
 
@@ -1559,6 +1596,52 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 				 sig_data, sizeof(*sig_data) + len);
 		kfree(sig_data);
 		break;
+	case AUDIT_SIGNAL_INFO2: {
+		unsigned int contidstrlen = 0;
+
+		len = 0;
+		if (audit_sig_sid) {
+			err = security_secid_to_secctx(audit_sig_sid, &ctx,
+						       &len);
+			if (err)
+				return err;
+		}
+		if (audit_sig_cid) {
+			contidstr = kmalloc(21, GFP_KERNEL);
+			if (!contidstr) {
+				if (audit_sig_sid)
+					security_release_secctx(ctx, len);
+				return -ENOMEM;
+			}
+			contidstrlen = scnprintf(contidstr, 20, "%llu", audit_sig_cid->id);
+		}
+		sig_data2 = kmalloc(sizeof(*sig_data2) + contidstrlen + len, GFP_KERNEL);
+		if (!sig_data2) {
+			if (audit_sig_sid)
+				security_release_secctx(ctx, len);
+			kfree(contidstr);
+			return -ENOMEM;
+		}
+		sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
+		sig_data2->pid = audit_sig_pid;
+		if (audit_sig_cid) {
+			memcpy(sig_data2->data, contidstr, contidstrlen);
+			sig_data2->cid_len = contidstrlen;
+			kfree(contidstr);
+		}
+		if (audit_sig_sid) {
+			memcpy(sig_data2->data + contidstrlen, ctx, len);
+			security_release_secctx(ctx, len);
+		}
+		rcu_read_lock();
+		_audit_contobj_put_sig(audit_sig_cid);
+		rcu_read_unlock();
+		audit_sig_cid = NULL;
+		audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
+				 sig_data2, sizeof(*sig_data2) + contidstrlen + len);
+		kfree(sig_data2);
+		break;
+	}
 	case AUDIT_TTY_GET: {
 		struct audit_tty_status s;
 		unsigned int t;
@@ -2470,6 +2553,11 @@ int audit_signal_info(int sig, struct task_struct *t)
 		else
 			audit_sig_uid = uid;
 		security_task_getsecid(current, &audit_sig_sid);
+		rcu_read_lock();
+		_audit_contobj_put_sig(audit_sig_cid);
+		audit_sig_cid = _audit_contobj_get_sig(current);
+		rcu_read_unlock();
+		audit_sig_atsk = t;
 	}
 
 	return audit_signal_info_syscall(t);
@@ -2532,6 +2620,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 			if (cont->id == contid) {
 				/* task injection to existing container */
 				if (current == cont->owner) {
+					if (!refcount_read(&cont->refcount)) {
+						rc = -ESHUTDOWN;
+						spin_unlock(&audit_contobj_list_lock);
+						goto conterror;
+					}
 					_audit_contobj_hold(cont);
 					newcont = cont;
 				} else {
diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
index b69231918686..8303bb7a63d0 100644
--- a/security/selinux/nlmsgtab.c
+++ b/security/selinux/nlmsgtab.c
@@ -137,6 +137,7 @@ struct nlmsg_perm {
 	{ AUDIT_DEL_RULE,	NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
 	{ AUDIT_USER,		NETLINK_AUDIT_SOCKET__NLMSG_RELAY    },
 	{ AUDIT_SIGNAL_INFO,	NETLINK_AUDIT_SOCKET__NLMSG_READ     },
+	{ AUDIT_SIGNAL_INFO2,	NETLINK_AUDIT_SOCKET__NLMSG_READ     },
 	{ AUDIT_TRIM,		NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
 	{ AUDIT_MAKE_EQUIV,	NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
 	{ AUDIT_TTY_GET,	NETLINK_AUDIT_SOCKET__NLMSG_READ     },
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 07/13] audit: add support for non-syscall auxiliary records
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (5 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 06/13] audit: add contid support for signalling the audit daemon Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:11   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 08/13] audit: add containerid support for user records Richard Guy Briggs
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Standalone audit records have the timestamp and serial number generated
on the fly and as such are unique, making them standalone.  This new
function audit_alloc_local() generates a local audit context that will
be used only for a standalone record and its auxiliary record(s).  The
context is discarded immediately after the local associated records are
produced.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 include/linux/audit.h |  8 ++++++++
 kernel/audit.h        |  1 +
 kernel/auditsc.c      | 33 ++++++++++++++++++++++++++++-----
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 89cf7c66abe6..15d0defc5193 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -330,6 +330,8 @@ static inline int audit_signal_info(int sig, struct task_struct *t)
 
 /* These are defined in auditsc.c */
 				/* Public API */
+extern struct audit_context *audit_alloc_local(gfp_t gfpflags);
+extern void audit_free_context(struct audit_context *context);
 extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
 				  unsigned long a2, unsigned long a3);
 extern void __audit_syscall_exit(int ret_success, long ret_value);
@@ -592,6 +594,12 @@ static inline void audit_log_nfcfg(const char *name, u8 af,
 extern int audit_n_rules;
 extern int audit_signals;
 #else /* CONFIG_AUDITSYSCALL */
+static inline struct audit_context *audit_alloc_local(gfp_t gfpflags)
+{
+	return NULL;
+}
+static inline void audit_free_context(struct audit_context *context)
+{ }
 static inline void audit_syscall_entry(int major, unsigned long a0,
 				       unsigned long a1, unsigned long a2,
 				       unsigned long a3)
diff --git a/kernel/audit.h b/kernel/audit.h
index 0c9446f8d52c..a7f88d76163f 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -98,6 +98,7 @@ struct audit_proctitle {
 struct audit_context {
 	int		    dummy;	/* must be the first element */
 	int		    in_syscall;	/* 1 if task is in a syscall */
+	bool		    local;	/* local context needed */
 	enum audit_state    state, current_state;
 	unsigned int	    serial;     /* serial number for record */
 	int		    major;      /* syscall number */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 9e79645e5c0e..935eb3d2cde9 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -908,11 +908,13 @@ static inline void audit_free_aux(struct audit_context *context)
 	}
 }
 
-static inline struct audit_context *audit_alloc_context(enum audit_state state)
+static inline struct audit_context *audit_alloc_context(enum audit_state state,
+							gfp_t gfpflags)
 {
 	struct audit_context *context;
 
-	context = kzalloc(sizeof(*context), GFP_KERNEL);
+	/* We can be called in atomic context via audit_tg() */
+	context = kzalloc(sizeof(*context), gfpflags);
 	if (!context)
 		return NULL;
 	context->state = state;
@@ -948,7 +950,8 @@ int audit_alloc_syscall(struct task_struct *tsk)
 		return 0;
 	}
 
-	if (!(context = audit_alloc_context(state))) {
+	context = audit_alloc_context(state, GFP_KERNEL);
+	if (!context) {
 		kfree(key);
 		audit_log_lost("out of memory in audit_alloc_syscall");
 		return -ENOMEM;
@@ -960,8 +963,27 @@ int audit_alloc_syscall(struct task_struct *tsk)
 	return 0;
 }
 
-static inline void audit_free_context(struct audit_context *context)
+struct audit_context *audit_alloc_local(gfp_t gfpflags)
 {
+	struct audit_context *context = NULL;
+
+	context = audit_alloc_context(AUDIT_RECORD_CONTEXT, gfpflags);
+	if (!context) {
+		audit_log_lost("out of memory in audit_alloc_local");
+		goto out;
+	}
+	context->serial = audit_serial();
+	ktime_get_coarse_real_ts64(&context->ctime);
+	context->local = true;
+out:
+	return context;
+}
+EXPORT_SYMBOL(audit_alloc_local);
+
+void audit_free_context(struct audit_context *context)
+{
+	if (!context)
+		return;
 	audit_free_module(context);
 	audit_free_names(context);
 	unroll_tree_refs(context, NULL, 0);
@@ -972,6 +994,7 @@ static inline void audit_free_context(struct audit_context *context)
 	audit_proctitle_free(context);
 	kfree(context);
 }
+EXPORT_SYMBOL(audit_free_context);
 
 static int audit_log_pid_context(struct audit_context *context, pid_t pid,
 				 kuid_t auid, kuid_t uid, unsigned int sessionid,
@@ -2204,7 +2227,7 @@ void __audit_inode_child(struct inode *parent,
 int auditsc_get_stamp(struct audit_context *ctx,
 		       struct timespec64 *t, unsigned int *serial)
 {
-	if (!ctx->in_syscall)
+	if (!ctx->in_syscall && !ctx->local)
 		return 0;
 	if (!ctx->serial)
 		ctx->serial = audit_serial();
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 08/13] audit: add containerid support for user records
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (6 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 07/13] audit: add support for non-syscall auxiliary records Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:11   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 09/13] audit: add containerid filtering Richard Guy Briggs
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Add audit container identifier auxiliary record to user event standalone
records.

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 kernel/audit.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/audit.c b/kernel/audit.c
index 54dd2cb69402..997c34178ee8 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1212,12 +1212,6 @@ static void audit_log_common_recv_msg(struct audit_context *context,
 	audit_log_task_context(*ab);
 }
 
-static inline void audit_log_user_recv_msg(struct audit_buffer **ab,
-					   u16 msg_type)
-{
-	audit_log_common_recv_msg(NULL, ab, msg_type);
-}
-
 int is_audit_feature_set(int i)
 {
 	return af.features & AUDIT_FEATURE_TO_MASK(i);
@@ -1486,6 +1480,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		err = audit_filter(msg_type, AUDIT_FILTER_USER);
 		if (err == 1) { /* match or error */
 			char *str = data;
+			struct audit_context *context;
+			struct audit_contobj *cont;
 
 			err = 0;
 			if (msg_type == AUDIT_USER_TTY) {
@@ -1493,7 +1489,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 				if (err)
 					break;
 			}
-			audit_log_user_recv_msg(&ab, msg_type);
+			context = audit_alloc_local(GFP_KERNEL);
+			audit_log_common_recv_msg(context, &ab, msg_type);
 			if (msg_type != AUDIT_USER_TTY) {
 				/* ensure NULL termination */
 				str[data_len - 1] = '\0';
@@ -1507,6 +1504,14 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 				audit_log_n_untrustedstring(ab, str, data_len);
 			}
 			audit_log_end(ab);
+			rcu_read_lock();
+			cont = _audit_contobj_get(current);
+			rcu_read_unlock();
+			audit_log_container_id(context, cont);
+			rcu_read_lock();
+			_audit_contobj_put(cont);
+			rcu_read_unlock();
+			audit_free_context(context);
 		}
 		break;
 	case AUDIT_ADD_RULE:
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 09/13] audit: add containerid filtering
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (7 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 08/13] audit: add containerid support for user records Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-06-27 13:20 ` [PATCH ghak90 V9 10/13] audit: add support for containerid to network namespaces Richard Guy Briggs
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Implement audit container identifier filtering using the AUDIT_CONTID
field name to send an 8-character string representing a u64 since the
value field is only u32.

Sending it as two u32 was considered, but gathering and comparing two
fields was more complex.

The feature indicator is AUDIT_FEATURE_BITMAP_CONTAINERID.

Please see the github audit kernel issue for the contid filter feature:
  https://github.com/linux-audit/audit-kernel/issues/91
Please see the github audit userspace issue for filter additions:
  https://github.com/linux-audit/audit-userspace/issues/40
Please see the github audit testsuiite issue for the test case:
  https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
  https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 include/linux/audit.h      |  1 +
 include/uapi/linux/audit.h |  5 ++++-
 kernel/audit.h             |  1 +
 kernel/auditfilter.c       | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/auditsc.c           |  4 ++++
 5 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 15d0defc5193..c4a755ae0d61 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -68,6 +68,7 @@ struct audit_field {
 	u32				type;
 	union {
 		u32			val;
+		u64			val64;
 		kuid_t			uid;
 		kgid_t			gid;
 		struct {
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index a56ad77069b9..831c12bdd235 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -271,6 +271,7 @@
 #define AUDIT_LOGINUID_SET	24
 #define AUDIT_SESSIONID	25	/* Session ID */
 #define AUDIT_FSTYPE	26	/* FileSystem Type */
+#define AUDIT_CONTID	27	/* Container ID */
 
 				/* These are ONLY useful when checking
 				 * at syscall exit time (AUDIT_AT_EXIT). */
@@ -352,6 +353,7 @@ enum {
 #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER	0x00000010
 #define AUDIT_FEATURE_BITMAP_LOST_RESET		0x00000020
 #define AUDIT_FEATURE_BITMAP_FILTER_FS		0x00000040
+#define AUDIT_FEATURE_BITMAP_CONTAINERID	0x00000080
 
 #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
 				  AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
@@ -359,7 +361,8 @@ enum {
 				  AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
 				  AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
 				  AUDIT_FEATURE_BITMAP_LOST_RESET | \
-				  AUDIT_FEATURE_BITMAP_FILTER_FS)
+				  AUDIT_FEATURE_BITMAP_FILTER_FS | \
+				  AUDIT_FEATURE_BITMAP_CONTAINERID)
 
 /* deprecated: AUDIT_VERSION_* */
 #define AUDIT_VERSION_LATEST 		AUDIT_FEATURE_BITMAP_ALL
diff --git a/kernel/audit.h b/kernel/audit.h
index a7f88d76163f..34d8ec4bc6ef 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -228,6 +228,7 @@ static inline int audit_hash_contid(u64 contid)
 
 extern int audit_match_class(int class, unsigned syscall);
 extern int audit_comparator(const u32 left, const u32 op, const u32 right);
+extern int audit_comparator64(const u64 left, const u32 op, const u64 right);
 extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right);
 extern int audit_gid_comparator(kgid_t left, u32 op, kgid_t right);
 extern int parent_len(const char *path);
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index a10e2997aa6c..d812698efc1d 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -399,6 +399,7 @@ static int audit_field_valid(struct audit_entry *entry, struct audit_field *f)
 	case AUDIT_FILETYPE:
 	case AUDIT_FIELD_COMPARE:
 	case AUDIT_EXE:
+	case AUDIT_CONTID:
 		/* only equal and not equal valid ops */
 		if (f->op != Audit_not_equal && f->op != Audit_equal)
 			return -EINVAL;
@@ -590,6 +591,14 @@ static struct audit_entry *audit_data_to_entry(struct audit_rule_data *data,
 			entry->rule.buflen += f_val;
 			entry->rule.exe = audit_mark;
 			break;
+		case AUDIT_CONTID:
+			if (f_val != sizeof(u64))
+				goto exit_free;
+			str = audit_unpack_string(&bufp, &remain, f_val);
+			if (IS_ERR(str))
+				goto exit_free;
+			f->val64 = ((u64 *)str)[0];
+			break;
 		default:
 			f->val = f_val;
 			break;
@@ -675,6 +684,11 @@ static struct audit_rule_data *audit_krule_to_data(struct audit_krule *krule)
 			data->buflen += data->values[i] =
 				audit_pack_string(&bufp, audit_mark_path(krule->exe));
 			break;
+		case AUDIT_CONTID:
+			data->buflen += data->values[i] = sizeof(u64);
+			memcpy(bufp, &f->val64, sizeof(u64));
+			bufp += sizeof(u64);
+			break;
 		case AUDIT_LOGINUID_SET:
 			if (krule->pflags & AUDIT_LOGINUID_LEGACY && !f->val) {
 				data->fields[i] = AUDIT_LOGINUID;
@@ -761,6 +775,10 @@ static int audit_compare_rule(struct audit_krule *a, struct audit_krule *b)
 			if (!gid_eq(a->fields[i].gid, b->fields[i].gid))
 				return 1;
 			break;
+		case AUDIT_CONTID:
+			if (a->fields[i].val64 != b->fields[i].val64)
+				return 1;
+			break;
 		default:
 			if (a->fields[i].val != b->fields[i].val)
 				return 1;
@@ -1216,6 +1234,30 @@ int audit_comparator(u32 left, u32 op, u32 right)
 	}
 }
 
+int audit_comparator64(u64 left, u32 op, u64 right)
+{
+	switch (op) {
+	case Audit_equal:
+		return (left == right);
+	case Audit_not_equal:
+		return (left != right);
+	case Audit_lt:
+		return (left < right);
+	case Audit_le:
+		return (left <= right);
+	case Audit_gt:
+		return (left > right);
+	case Audit_ge:
+		return (left >= right);
+	case Audit_bitmask:
+		return (left & right);
+	case Audit_bittest:
+		return ((left & right) == right);
+	default:
+		return 0;
+	}
+}
+
 int audit_uid_comparator(kuid_t left, u32 op, kuid_t right)
 {
 	switch (op) {
@@ -1350,6 +1392,10 @@ int audit_filter(int msgtype, unsigned int listtype)
 				result = audit_comparator(audit_loginuid_set(current),
 							  f->op, f->val);
 				break;
+			case AUDIT_CONTID:
+				result = audit_comparator64(audit_get_contid(current),
+							    f->op, f->val64);
+				break;
 			case AUDIT_MSGTYPE:
 				result = audit_comparator(msgtype, f->op, f->val);
 				break;
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 935eb3d2cde9..baa5709590b4 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -640,6 +640,10 @@ static int audit_filter_rules(struct task_struct *tsk,
 				result = audit_comparator(ctx->sockaddr->ss_family,
 							  f->op, f->val);
 			break;
+		case AUDIT_CONTID:
+			result = audit_comparator64(audit_get_contid(tsk),
+						    f->op, f->val64);
+			break;
 		case AUDIT_SUBJ_USER:
 		case AUDIT_SUBJ_ROLE:
 		case AUDIT_SUBJ_TYPE:
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 10/13] audit: add support for containerid to network namespaces
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (8 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 09/13] audit: add containerid filtering Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:11   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 11/13] audit: contid check descendancy and nesting Richard Guy Briggs
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

This also adds support to qualify NETFILTER_PKT records.

Audit events could happen in a network namespace outside of a task
context due to packets received from the net that trigger an auditing
rule prior to being associated with a running task.  The network
namespace could be in use by multiple containers by association to the
tasks in that network namespace.  We still want a way to attribute
these events to any potential containers.  Keep a list per network
namespace to track these audit container identifiiers.

Add/increment the audit container identifier on:
- initial setting of the audit container identifier via /proc
- clone/fork call that inherits an audit container identifier
- unshare call that inherits an audit container identifier
- setns call that inherits an audit container identifier
Delete/decrement the audit container identifier on:
- an inherited audit container identifier dropped when child set
- process exit
- unshare call that drops a net namespace
- setns call that drops a net namespace

Add audit container identifier auxiliary record(s) to NETFILTER_PKT
event standalone records.  Iterate through all potential audit container
identifiers associated with a network namespace.

Please see the github audit kernel issue for contid net support:
  https://github.com/linux-audit/audit-kernel/issues/92
Please see the github audit testsuiite issue for the test case:
  https://github.com/linux-audit/audit-testsuite/issues/64
Please see the github audit wiki for the feature overview:
  https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 include/linux/audit.h    |  20 ++++++
 kernel/audit.c           | 156 ++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/nsproxy.c         |   4 ++
 net/netfilter/nft_log.c  |  11 +++-
 net/netfilter/xt_AUDIT.c |  11 +++-
 5 files changed, 195 insertions(+), 7 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index c4a755ae0d61..304fbb7c3c5b 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -128,6 +128,13 @@ struct audit_task_info {
 
 extern struct audit_task_info init_struct_audit;
 
+struct audit_contobj_netns {
+	struct list_head	list;
+	struct audit_contobj	*obj;
+	int			count;
+	struct rcu_head		rcu;
+};
+
 extern int is_audit_feature_set(int which);
 
 extern int __init audit_register_class(int class, unsigned *list);
@@ -233,6 +240,11 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
 
 extern void audit_log_container_id(struct audit_context *context,
 				   struct audit_contobj *cont);
+extern void audit_copy_namespaces(struct net *net, struct task_struct *tsk);
+extern void audit_switch_task_namespaces(struct nsproxy *ns,
+					 struct task_struct *p);
+extern void audit_log_netns_contid_list(struct net *net,
+					struct audit_context *context);
 
 extern u32 audit_enabled;
 
@@ -306,6 +318,14 @@ static inline u64 audit_get_contid(struct task_struct *tsk)
 static inline void audit_log_container_id(struct audit_context *context,
 					  struct audit_contobj *cont)
 { }
+static inline void audit_copy_namespaces(struct net *net, struct task_struct *tsk)
+{ }
+static inline void audit_switch_task_namespaces(struct nsproxy *ns,
+						struct task_struct *p)
+{ }
+static inline void audit_log_netns_contid_list(struct net *net,
+					       struct audit_context *context)
+{ }
 
 #define audit_enabled AUDIT_OFF
 
diff --git a/kernel/audit.c b/kernel/audit.c
index 997c34178ee8..a862721dfd9b 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -59,6 +59,7 @@
 #include <linux/freezer.h>
 #include <linux/pid_namespace.h>
 #include <net/netns/generic.h>
+#include <net/net_namespace.h>
 
 #include "audit.h"
 
@@ -86,9 +87,13 @@
 /**
  * struct audit_net - audit private network namespace data
  * @sk: communication socket
+ * @contobj_list: audit container identifier list
+ * @contobj_list_lock audit container identifier list lock
  */
 struct audit_net {
 	struct sock *sk;
+	struct list_head contobj_list;
+	spinlock_t contobj_list_lock;
 };
 
 /**
@@ -214,6 +219,9 @@ struct audit_reply {
 
 static struct kmem_cache *audit_task_cache;
 
+void audit_netns_contid_add(struct net *net, struct audit_contobj *cont);
+void audit_netns_contid_del(struct net *net, struct audit_contobj *cont);
+
 void __init audit_task_init(void)
 {
 	audit_task_cache = kmem_cache_create("audit_task",
@@ -326,10 +334,17 @@ struct audit_task_info init_struct_audit = {
 void audit_free(struct task_struct *tsk)
 {
 	struct audit_task_info *info = tsk->audit;
+	struct nsproxy *ns = tsk->nsproxy;
+	struct audit_contobj *cont;
 
 	audit_free_syscall(tsk);
 	rcu_read_lock();
-	_audit_contobj_put(tsk->audit->cont);
+	cont = _audit_contobj_get(tsk);
+	if (ns) {
+		audit_netns_contid_del(ns->net_ns, cont);
+		_audit_contobj_put(cont);
+	}
+	_audit_contobj_put(cont);
 	rcu_read_unlock();
 	/* Freeing the audit_task_info struct must be performed after
 	 * audit_log_exit() due to need for loginuid and sessionid.
@@ -437,6 +452,136 @@ static struct sock *audit_get_sk(const struct net *net)
 	return aunet->sk;
 }
 
+void audit_netns_contid_add(struct net *net, struct audit_contobj *cont)
+{
+	struct audit_net *aunet;
+	struct list_head *contobj_list;
+	struct audit_contobj_netns *contns;
+
+	if (!net)
+		return;
+	if (!cont)
+		return;
+	aunet = net_generic(net, audit_net_id);
+	if (!aunet)
+		return;
+	contobj_list = &aunet->contobj_list;
+	rcu_read_lock();
+	spin_lock(&aunet->contobj_list_lock);
+	list_for_each_entry_rcu(contns, contobj_list, list)
+		if (contns->obj == cont) {
+			contns->count++;
+			goto out;
+		}
+	contns = kmalloc(sizeof(*contns), GFP_ATOMIC);
+	if (contns) {
+		INIT_LIST_HEAD(&contns->list);
+		contns->obj = cont;
+		contns->count = 1;
+		list_add_rcu(&contns->list, contobj_list);
+	}
+out:
+	spin_unlock(&aunet->contobj_list_lock);
+	rcu_read_unlock();
+}
+
+void audit_netns_contid_del(struct net *net, struct audit_contobj *cont)
+{
+	struct audit_net *aunet;
+	struct list_head *contobj_list;
+	struct audit_contobj_netns *contns = NULL;
+
+	if (!net)
+		return;
+	if (!cont)
+		return;
+	aunet = net_generic(net, audit_net_id);
+	if (!aunet)
+		return;
+	contobj_list = &aunet->contobj_list;
+	rcu_read_lock();
+	spin_lock(&aunet->contobj_list_lock);
+	list_for_each_entry_rcu(contns, contobj_list, list)
+		if (contns->obj == cont) {
+			contns->count--;
+			if (contns->count < 1) {
+				list_del_rcu(&contns->list);
+				kfree_rcu(contns, rcu);
+			}
+			break;
+		}
+	spin_unlock(&aunet->contobj_list_lock);
+	rcu_read_unlock();
+}
+
+void audit_copy_namespaces(struct net *net, struct task_struct *tsk)
+{
+	struct audit_contobj *cont;
+
+	rcu_read_lock();
+	cont = _audit_contobj_get(tsk);
+	audit_netns_contid_add(net, cont);
+	rcu_read_unlock();
+}
+
+void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
+{
+	struct audit_contobj *cont;
+	struct nsproxy *new = p->nsproxy;
+
+	rcu_read_lock();
+	cont = _audit_contobj_get(p);
+	if (!cont)
+		goto out;
+	audit_netns_contid_del(ns->net_ns, cont);
+	if (new)
+		audit_netns_contid_add(new->net_ns, cont);
+	else
+		_audit_contobj_put(cont);
+	_audit_contobj_put(cont);
+out:
+	rcu_read_unlock();
+}
+
+/**
+ * audit_log_netns_contid_list - List contids for the given network namespace
+ * @net: the network namespace of interest
+ * @context: the audit context to use
+ *
+ * Description:
+ * Issues a CONTAINER_ID record with a CSV list of contids associated
+ * with a network namespace to accompany a NETFILTER_PKT record.
+ */
+void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
+{
+	struct audit_buffer *ab = NULL;
+	struct audit_contobj_netns *cont;
+	struct audit_net *aunet;
+
+	/* Generate AUDIT_CONTAINER_ID record with container ID CSV list */
+	rcu_read_lock();
+	aunet = net_generic(net, audit_net_id);
+	if (!aunet)
+		goto out;
+	list_for_each_entry_rcu(cont, &aunet->contobj_list, list) {
+		if (!ab) {
+			ab = audit_log_start(context, GFP_ATOMIC,
+					     AUDIT_CONTAINER_ID);
+			if (!ab) {
+				audit_log_lost("out of memory in audit_log_netns_contid_list");
+				goto out;
+			}
+			audit_log_format(ab, "contid=");
+		} else
+			audit_log_format(ab, ",");
+		audit_log_format(ab, "%llu", cont->obj->id);
+	}
+	audit_log_end(ab);
+out:
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(audit_log_netns_contid_list);
+
 void audit_panic(const char *message)
 {
 	switch (audit_failure) {
@@ -1786,7 +1931,6 @@ static int __net_init audit_net_init(struct net *net)
 		.flags	= NL_CFG_F_NONROOT_RECV,
 		.groups	= AUDIT_NLGRP_MAX,
 	};
-
 	struct audit_net *aunet = net_generic(net, audit_net_id);
 
 	aunet->sk = netlink_kernel_create(net, NETLINK_AUDIT, &cfg);
@@ -1795,7 +1939,8 @@ static int __net_init audit_net_init(struct net *net)
 		return -ENOMEM;
 	}
 	aunet->sk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
-
+	INIT_LIST_HEAD(&aunet->contobj_list);
+	spin_lock_init(&aunet->contobj_list_lock);
 	return 0;
 }
 
@@ -2585,6 +2730,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 	int rc = 0;
 	struct audit_buffer *ab;
 	struct audit_contobj *oldcont = NULL;
+	struct net *net = task->nsproxy->net_ns;
 
 	task_lock(task);
 	/* Can't set if audit disabled */
@@ -2657,6 +2803,10 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 		spin_unlock(&audit_contobj_list_lock);
 		task->audit->cont = newcont;
 		_audit_contobj_put(oldcont);
+		audit_netns_contid_del(net, oldcont);
+		_audit_contobj_put(oldcont);
+		_audit_contobj_hold(newcont);
+		audit_netns_contid_add(net, newcont);
 	}
 conterror:
 	task_unlock(task);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index b03df67621d0..5eddb3377049 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -26,6 +26,7 @@
 #include <linux/syscalls.h>
 #include <linux/cgroup.h>
 #include <linux/perf_event.h>
+#include <linux/audit.h>
 
 static struct kmem_cache *nsproxy_cachep;
 
@@ -187,6 +188,8 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk)
 	}
 
 	tsk->nsproxy = new_ns;
+	if (flags & CLONE_NEWNET)
+		audit_copy_namespaces(new_ns->net_ns, tsk);
 	return 0;
 }
 
@@ -249,6 +252,7 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
 	ns = p->nsproxy;
 	p->nsproxy = new;
 	task_unlock(p);
+	audit_switch_task_namespaces(ns, p);
 
 	if (ns && atomic_dec_and_test(&ns->count))
 		free_nsproxy(ns);
diff --git a/net/netfilter/nft_log.c b/net/netfilter/nft_log.c
index fe4831f2258f..98d1e7e1a83c 100644
--- a/net/netfilter/nft_log.c
+++ b/net/netfilter/nft_log.c
@@ -66,13 +66,16 @@ static void nft_log_eval_audit(const struct nft_pktinfo *pkt)
 	struct sk_buff *skb = pkt->skb;
 	struct audit_buffer *ab;
 	int fam = -1;
+	struct audit_context *context;
+	struct net *net;
 
 	if (!audit_enabled)
 		return;
 
-	ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
+	context = audit_alloc_local(GFP_ATOMIC);
+	ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
 	if (!ab)
-		return;
+		goto errout;
 
 	audit_log_format(ab, "mark=%#x", skb->mark);
 
@@ -99,6 +102,10 @@ static void nft_log_eval_audit(const struct nft_pktinfo *pkt)
 		audit_log_format(ab, " saddr=? daddr=? proto=-1");
 
 	audit_log_end(ab);
+	net = xt_net(&pkt->xt);
+	audit_log_netns_contid_list(net, context);
+errout:
+	audit_free_context(context);
 }
 
 static void nft_log_eval(const struct nft_expr *expr,
diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
index 9cdc16b0d0d8..ecf868a1abde 100644
--- a/net/netfilter/xt_AUDIT.c
+++ b/net/netfilter/xt_AUDIT.c
@@ -68,10 +68,13 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
 {
 	struct audit_buffer *ab;
 	int fam = -1;
+	struct audit_context *context;
+	struct net *net;
 
 	if (audit_enabled == AUDIT_OFF)
-		goto errout;
-	ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
+		goto out;
+	context = audit_alloc_local(GFP_ATOMIC);
+	ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
 	if (ab == NULL)
 		goto errout;
 
@@ -101,7 +104,11 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
 
 	audit_log_end(ab);
 
+	net = xt_net(par);
+	audit_log_netns_contid_list(net, context);
 errout:
+	audit_free_context(context);
+out:
 	return XT_CONTINUE;
 }
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 11/13] audit: contid check descendancy and nesting
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (9 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 10/13] audit: add support for containerid to network namespaces Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:11   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 12/13] audit: track container nesting Richard Guy Briggs
  2020-06-27 13:20 ` [PATCH ghak90 V9 13/13] audit: add capcontid to set contid outside init_user_ns Richard Guy Briggs
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Require the target task to be a descendant of the container
orchestrator/engine.

You would only change the audit container ID from one set or inherited
value to another if you were nesting containers.

If changing the contid, the container orchestrator/engine must be a
descendant and not same orchestrator as the one that set it so it is not
possible to change the contid of another orchestrator's container.

Since the task_is_descendant() function is used in YAMA and in audit,
remove the duplication and pull the function into kernel/core/sched.c

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 include/linux/sched.h    |  3 +++
 kernel/audit.c           | 23 +++++++++++++++++++++--
 kernel/sched/core.c      | 33 +++++++++++++++++++++++++++++++++
 security/yama/yama_lsm.c | 33 ---------------------------------
 4 files changed, 57 insertions(+), 35 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2213ac670386..06938d0b9e0c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2047,4 +2047,7 @@ static inline void rseq_syscall(struct pt_regs *regs)
 
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
 
+extern int task_is_descendant(struct task_struct *parent,
+			      struct task_struct *child);
+
 #endif
diff --git a/kernel/audit.c b/kernel/audit.c
index a862721dfd9b..efa65ec01239 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2713,6 +2713,20 @@ int audit_signal_info(int sig, struct task_struct *t)
 	return audit_signal_info_syscall(t);
 }
 
+static bool audit_contid_isnesting(struct task_struct *tsk)
+{
+	bool isowner = false;
+	bool ownerisparent = false;
+
+	rcu_read_lock();
+	if (tsk->audit && tsk->audit->cont) {
+		isowner = current == tsk->audit->cont->owner;
+		ownerisparent = task_is_descendant(tsk->audit->cont->owner, current);
+	}
+	rcu_read_unlock();
+	return !isowner && ownerisparent;
+}
+
 /*
  * audit_set_contid - set current task's audit contid
  * @task: target task
@@ -2755,8 +2769,13 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 		rc = -EBUSY;
 		goto unlock;
 	}
-	/* if contid is already set, deny */
-	if (audit_contid_set(task))
+	/* if task is not descendant, block */
+	if (task == current || !task_is_descendant(current, task)) {
+		rc = -EXDEV;
+		goto unlock;
+	}
+	/* only allow contid setting again if nesting */
+	if (audit_contid_set(task) && !audit_contid_isnesting(task))
 		rc = -EEXIST;
 unlock:
 	read_unlock(&tasklist_lock);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8f360326861e..e6b24c52b3c3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8012,6 +8012,39 @@ void dump_cpu_task(int cpu)
 }
 
 /*
+ * task_is_descendant - walk up a process family tree looking for a match
+ * @parent: the process to compare against while walking up from child
+ * @child: the process to start from while looking upwards for parent
+ *
+ * Returns 1 if child is a descendant of parent, 0 if not.
+ */
+int task_is_descendant(struct task_struct *parent,
+			      struct task_struct *child)
+{
+	int rc = 0;
+	struct task_struct *walker = child;
+
+	if (!parent || !child)
+		return 0;
+
+	rcu_read_lock();
+	if (!thread_group_leader(parent))
+		parent = rcu_dereference(parent->group_leader);
+	while (walker->pid > 0) {
+		if (!thread_group_leader(walker))
+			walker = rcu_dereference(walker->group_leader);
+		if (walker == parent) {
+			rc = 1;
+			break;
+		}
+		walker = rcu_dereference(walker->real_parent);
+	}
+	rcu_read_unlock();
+
+	return rc;
+}
+
+/*
  * Nice levels are multiplicative, with a gentle 10% change for every
  * nice level changed. I.e. when a CPU-bound task goes from nice 0 to
  * nice 1, it will get ~10% less CPU time than another CPU-bound task
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 536c99646f6a..24939f765df5 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -263,39 +263,6 @@ static int yama_task_prctl(int option, unsigned long arg2, unsigned long arg3,
 }
 
 /**
- * task_is_descendant - walk up a process family tree looking for a match
- * @parent: the process to compare against while walking up from child
- * @child: the process to start from while looking upwards for parent
- *
- * Returns 1 if child is a descendant of parent, 0 if not.
- */
-static int task_is_descendant(struct task_struct *parent,
-			      struct task_struct *child)
-{
-	int rc = 0;
-	struct task_struct *walker = child;
-
-	if (!parent || !child)
-		return 0;
-
-	rcu_read_lock();
-	if (!thread_group_leader(parent))
-		parent = rcu_dereference(parent->group_leader);
-	while (walker->pid > 0) {
-		if (!thread_group_leader(walker))
-			walker = rcu_dereference(walker->group_leader);
-		if (walker == parent) {
-			rc = 1;
-			break;
-		}
-		walker = rcu_dereference(walker->real_parent);
-	}
-	rcu_read_unlock();
-
-	return rc;
-}
-
-/**
  * ptracer_exception_found - tracer registered as exception for this tracee
  * @tracer: the task_struct of the process attempting ptrace
  * @tracee: the task_struct of the process to be ptraced
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 12/13] audit: track container nesting
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (10 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 11/13] audit: contid check descendancy and nesting Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:11   ` Paul Moore
  2020-06-27 13:20 ` [PATCH ghak90 V9 13/13] audit: add capcontid to set contid outside init_user_ns Richard Guy Briggs
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Track the parent container of a container to be able to filter and
report nesting.

Now that we have a way to track and check the parent container of a
container, modify the contid field format to be able to report that
nesting using a carrat ("^") modifier to indicate nesting.  The
original field format was "contid=<contid>" for task-associated records
and "contid=<contid>[,<contid>[...]]" for network-namespace-associated
records.  The new field format is
"contid=<contid>[,^<contid>[...]][,<contid>[...]]".

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 include/linux/audit.h |  1 +
 kernel/audit.c        | 60 ++++++++++++++++++++++++++++++++++++++++++---------
 kernel/audit.h        |  2 ++
 kernel/auditfilter.c  | 17 ++++++++++++++-
 kernel/auditsc.c      |  2 +-
 5 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 304fbb7c3c5b..025b52ae8422 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -115,6 +115,7 @@ struct audit_contobj {
 	refcount_t		refcount;
 	refcount_t		sigflag;
 	struct rcu_head         rcu;
+	struct audit_contobj	*parent;
 };
 
 struct audit_task_info {
diff --git a/kernel/audit.c b/kernel/audit.c
index efa65ec01239..aaf74702e993 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -221,6 +221,7 @@ struct audit_reply {
 
 void audit_netns_contid_add(struct net *net, struct audit_contobj *cont);
 void audit_netns_contid_del(struct net *net, struct audit_contobj *cont);
+void audit_log_contid(struct audit_buffer *ab, struct audit_contobj *cont);
 
 void __init audit_task_init(void)
 {
@@ -277,6 +278,7 @@ static void _audit_contobj_put_sig(struct audit_contobj *cont)
 	refcount_set(&cont->sigflag, 0);
 	if (!refcount_read(&cont->refcount)) {
 		put_task_struct(cont->owner);
+		_audit_contobj_put(cont->parent);
 		list_del_rcu(&cont->list);
 		kfree_rcu(cont, rcu);
 	}
@@ -574,7 +576,7 @@ void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
 			audit_log_format(ab, "contid=");
 		} else
 			audit_log_format(ab, ",");
-		audit_log_format(ab, "%llu", cont->obj->id);
+		audit_log_contid(ab, cont->obj);
 	}
 	audit_log_end(ab);
 out:
@@ -1747,7 +1749,9 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		kfree(sig_data);
 		break;
 	case AUDIT_SIGNAL_INFO2: {
+		char *contidstr = NULL;
 		unsigned int contidstrlen = 0;
+		struct audit_contobj *cont = audit_sig_cid;
 
 		len = 0;
 		if (audit_sig_sid) {
@@ -1757,13 +1761,27 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 				return err;
 		}
 		if (audit_sig_cid) {
-			contidstr = kmalloc(21, GFP_KERNEL);
+			contidstr = kmalloc(AUDIT_MESSAGE_TEXT_MAX, GFP_KERNEL);
 			if (!contidstr) {
 				if (audit_sig_sid)
 					security_release_secctx(ctx, len);
 				return -ENOMEM;
 			}
-			contidstrlen = scnprintf(contidstr, 20, "%llu", audit_sig_cid->id);
+			rcu_read_lock();
+			while (cont) {
+				if (cont->parent)
+					contidstrlen += scnprintf(contidstr,
+								  AUDIT_MESSAGE_TEXT_MAX -
+								  contidstrlen,
+								  "%llu,^", cont->id);
+				else
+					contidstrlen += scnprintf(contidstr,
+								  AUDIT_MESSAGE_TEXT_MAX -
+								  contidstrlen,
+								  "%llu", cont->id);
+				cont = cont->parent;
+			}
+			rcu_read_unlock();
 		}
 		sig_data2 = kmalloc(sizeof(*sig_data2) + contidstrlen + len, GFP_KERNEL);
 		if (!sig_data2) {
@@ -2444,6 +2462,23 @@ void audit_log_session_info(struct audit_buffer *ab)
 	audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
 }
 
+void audit_log_contid(struct audit_buffer *ab, struct audit_contobj *cont)
+{
+	if (!cont) {
+		audit_log_format(ab, "-1");
+		return;
+	}
+	rcu_read_lock();
+	while (cont) {
+		if (cont->parent)
+			audit_log_format(ab, "%llu,^", cont->id);
+		else
+			audit_log_format(ab, "%llu", cont->id);
+		cont = cont->parent;
+	}
+	rcu_read_unlock();
+}
+
 /*
  * audit_log_container_id - report container info
  * @context: task or local context for record
@@ -2460,7 +2495,8 @@ void audit_log_container_id(struct audit_context *context,
 	ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
 	if (!ab)
 		return;
-	audit_log_format(ab, "contid=%llu", contid);
+	audit_log_format(ab, "contid=");
+	audit_log_contid(ab, cont);
 	audit_log_end(ab);
 }
 EXPORT_SYMBOL(audit_log_container_id);
@@ -2810,6 +2846,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 				INIT_LIST_HEAD(&newcont->list);
 				newcont->id = contid;
 				newcont->owner = get_task_struct(current);
+				newcont->parent = _audit_contobj_get(newcont->owner);
 				refcount_set(&newcont->refcount, 1);
 				list_add_rcu(&newcont->list,
 					     &audit_contid_hash[h]);
@@ -2828,6 +2865,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 		audit_netns_contid_add(net, newcont);
 	}
 conterror:
+	rcu_read_unlock();
 	task_unlock(task);
 
 	if (!audit_enabled)
@@ -2837,12 +2875,13 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 	if (!ab)
 		return rc;
 
-	audit_log_format(ab,
-			 "op=set opid=%d contid=%llu old-contid=%llu",
-			 task_tgid_nr(task), contid, oldcont ? oldcont->id : -1);
+	audit_log_format(ab, "op=set opid=%d contid=%llu old-contid=",
+			 task_tgid_nr(task), contid);
+	audit_log_contid(ab, oldcont);
+	audit_log_end(ab);
+	rcu_read_lock();
 	_audit_contobj_put(oldcont);
 	rcu_read_unlock();
-	audit_log_end(ab);
 	return rc;
 }
 
@@ -2859,8 +2898,9 @@ void audit_log_container_drop(void)
 	ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
 	if (!ab)
 		goto out;
-	audit_log_format(ab, "op=drop opid=%d contid=%llu old-contid=%llu",
-			 task_tgid_nr(current), cont->id, cont->id);
+	audit_log_format(ab, "op=drop opid=%d contid=%llu old-contid=",
+			 task_tgid_nr(current), AUDIT_CID_UNSET);
+	audit_log_contid(ab, cont);
 	audit_log_end(ab);
 out:
 	rcu_read_unlock();
diff --git a/kernel/audit.h b/kernel/audit.h
index 34d8ec4bc6ef..7bea5b51124b 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -229,6 +229,8 @@ static inline int audit_hash_contid(u64 contid)
 extern int audit_match_class(int class, unsigned syscall);
 extern int audit_comparator(const u32 left, const u32 op, const u32 right);
 extern int audit_comparator64(const u64 left, const u32 op, const u64 right);
+extern int audit_contid_comparator(const u64 left, const u32 op,
+				   const u64 right);
 extern int audit_uid_comparator(kuid_t left, u32 op, kuid_t right);
 extern int audit_gid_comparator(kgid_t left, u32 op, kgid_t right);
 extern int parent_len(const char *path);
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index d812698efc1d..981c72a8b863 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -1302,6 +1302,21 @@ int audit_gid_comparator(kgid_t left, u32 op, kgid_t right)
 	}
 }
 
+int audit_contid_comparator(u64 left, u32 op, u64 right)
+{
+	struct audit_contobj *cont = NULL;
+	int h;
+	int result = 0;
+
+	h = audit_hash_contid(left);
+	list_for_each_entry_rcu(cont, &audit_contid_hash[h], list) {
+		result = audit_comparator64(cont->id, op, right);
+		if (result)
+			break;
+	}
+	return result;
+}
+
 /**
  * parent_len - find the length of the parent portion of a pathname
  * @path: pathname of which to determine length
@@ -1393,7 +1408,7 @@ int audit_filter(int msgtype, unsigned int listtype)
 							  f->op, f->val);
 				break;
 			case AUDIT_CONTID:
-				result = audit_comparator64(audit_get_contid(current),
+				result = audit_contid_comparator(audit_get_contid(current),
 							    f->op, f->val64);
 				break;
 			case AUDIT_MSGTYPE:
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index baa5709590b4..9198857ac721 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -641,7 +641,7 @@ static int audit_filter_rules(struct task_struct *tsk,
 							  f->op, f->val);
 			break;
 		case AUDIT_CONTID:
-			result = audit_comparator64(audit_get_contid(tsk),
+			result = audit_contid_comparator(audit_get_contid(tsk),
 						    f->op, f->val64);
 			break;
 		case AUDIT_SUBJ_USER:
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH ghak90 V9 13/13] audit: add capcontid to set contid outside init_user_ns
  2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
                   ` (11 preceding siblings ...)
  2020-06-27 13:20 ` [PATCH ghak90 V9 12/13] audit: track container nesting Richard Guy Briggs
@ 2020-06-27 13:20 ` Richard Guy Briggs
  2020-07-05 15:11   ` Paul Moore
  12 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-06-27 13:20 UTC (permalink / raw)
  To: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel
  Cc: Paul Moore, sgrubb, omosnace, dhowells, simo, eparis, serge,
	ebiederm, nhorman, dwalsh, mpatel, Richard Guy Briggs

Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
process in a non-init user namespace the capability to set audit
container identifiers of individual children.

Provide the /proc/$PID/audit_capcontid interface to capcontid.
Valid values are: 1==enabled, 0==disabled

Writing a "1" to this special file for the target process $PID will
enable the target process to set audit container identifiers of its
descendants.

A process must already have CAP_AUDIT_CONTROL in the initial user
namespace or have had audit_capcontid enabled by a previous use of this
feature by its parent on this process in order to be able to enable it
for another process.  The target process must be a descendant of the
calling process.

Report this action in new message type AUDIT_SET_CAPCONTID 1022 with
fields opid= capcontid= old-capcontid=

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
---
 fs/proc/base.c             | 57 +++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/audit.h      | 14 ++++++++++++
 include/uapi/linux/audit.h |  1 +
 kernel/audit.c             | 38 ++++++++++++++++++++++++++++++-
 4 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 794474cd8f35..1083db2ce345 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1329,7 +1329,7 @@ static ssize_t proc_contid_read(struct file *file, char __user *buf,
 	if (!task)
 		return -ESRCH;
 	/* if we don't have caps, reject */
-	if (!capable(CAP_AUDIT_CONTROL))
+	if (!capable(CAP_AUDIT_CONTROL) && !audit_get_capcontid(current))
 		return -EPERM;
 	length = scnprintf(tmpbuf, TMPBUFLEN, "%llu", audit_get_contid(task));
 	put_task_struct(task);
@@ -1370,6 +1370,59 @@ static ssize_t proc_contid_write(struct file *file, const char __user *buf,
 	.write		= proc_contid_write,
 	.llseek		= generic_file_llseek,
 };
+
+static ssize_t proc_capcontid_read(struct file *file, char __user *buf,
+				  size_t count, loff_t *ppos)
+{
+	struct inode *inode = file_inode(file);
+	struct task_struct *task = get_proc_task(inode);
+	ssize_t length;
+	char tmpbuf[TMPBUFLEN];
+
+	if (!task)
+		return -ESRCH;
+	/* if we don't have caps, reject */
+	if (!capable(CAP_AUDIT_CONTROL) && !audit_get_capcontid(current))
+		return -EPERM;
+	length = scnprintf(tmpbuf, TMPBUFLEN, "%u", audit_get_capcontid(task));
+	put_task_struct(task);
+	return simple_read_from_buffer(buf, count, ppos, tmpbuf, length);
+}
+
+static ssize_t proc_capcontid_write(struct file *file, const char __user *buf,
+				   size_t count, loff_t *ppos)
+{
+	struct inode *inode = file_inode(file);
+	u32 capcontid;
+	int rv;
+	struct task_struct *task = get_proc_task(inode);
+
+	if (!task)
+		return -ESRCH;
+	if (*ppos != 0) {
+		/* No partial writes. */
+		put_task_struct(task);
+		return -EINVAL;
+	}
+
+	rv = kstrtou32_from_user(buf, count, 10, &capcontid);
+	if (rv < 0) {
+		put_task_struct(task);
+		return rv;
+	}
+
+	rv = audit_set_capcontid(task, capcontid);
+	put_task_struct(task);
+	if (rv < 0)
+		return rv;
+	return count;
+}
+
+static const struct file_operations proc_capcontid_operations = {
+	.read		= proc_capcontid_read,
+	.write		= proc_capcontid_write,
+	.llseek		= generic_file_llseek,
+};
 #endif
 
 #ifdef CONFIG_FAULT_INJECTION
@@ -3273,6 +3326,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
 	REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUGO, proc_sessionid_operations),
 	REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
+	REG("audit_capcontainerid", S_IWUSR|S_IRUSR, proc_capcontid_operations),
 #endif
 #ifdef CONFIG_FAULT_INJECTION
 	REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
@@ -3613,6 +3667,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
 	REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
 	REG("sessionid",  S_IRUGO, proc_sessionid_operations),
 	REG("audit_containerid", S_IWUSR|S_IRUSR, proc_contid_operations),
+	REG("audit_capcontainerid", S_IWUSR|S_IRUSR, proc_capcontid_operations),
 #endif
 #ifdef CONFIG_FAULT_INJECTION
 	REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
diff --git a/include/linux/audit.h b/include/linux/audit.h
index 025b52ae8422..2b3a2b6020ed 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -122,6 +122,7 @@ struct audit_task_info {
 	kuid_t			loginuid;
 	unsigned int		sessionid;
 	struct audit_contobj	*cont;
+	u32			capcontid;
 #ifdef CONFIG_AUDITSYSCALL
 	struct audit_context	*ctx;
 #endif
@@ -230,6 +231,14 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 	return tsk->audit->sessionid;
 }
 
+static inline u32 audit_get_capcontid(struct task_struct *tsk)
+{
+	if (!tsk->audit)
+		return 0;
+	return tsk->audit->capcontid;
+}
+
+extern int audit_set_capcontid(struct task_struct *tsk, u32 enable);
 extern int audit_set_contid(struct task_struct *tsk, u64 contid);
 
 static inline u64 audit_get_contid(struct task_struct *tsk)
@@ -311,6 +320,11 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
 	return AUDIT_SID_UNSET;
 }
 
+static inline u32 audit_get_capcontid(struct task_struct *tsk)
+{
+	return 0;
+}
+
 static inline u64 audit_get_contid(struct task_struct *tsk)
 {
 	return AUDIT_CID_UNSET;
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index 831c12bdd235..5e30f4c95dc2 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -73,6 +73,7 @@
 #define AUDIT_GET_FEATURE	1019	/* Get which features are enabled */
 #define AUDIT_CONTAINER_OP	1020	/* Define the container id and info */
 #define AUDIT_SIGNAL_INFO2	1021	/* Get info auditd signal sender */
+#define AUDIT_SET_CAPCONTID	1022	/* Set cap_contid of a task */
 
 #define AUDIT_FIRST_USER_MSG	1100	/* Userspace messages mostly uninteresting to kernel */
 #define AUDIT_USER_AVC		1107	/* We filter this differently */
diff --git a/kernel/audit.c b/kernel/audit.c
index aaf74702e993..454473f2e193 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -307,6 +307,7 @@ int audit_alloc(struct task_struct *tsk)
 	rcu_read_lock();
 	info->cont = _audit_contobj_get(current);
 	rcu_read_unlock();
+	info->capcontid = 0;
 	tsk->audit = info;
 
 	ret = audit_alloc_syscall(tsk);
@@ -322,6 +323,7 @@ struct audit_task_info init_struct_audit = {
 	.loginuid = INVALID_UID,
 	.sessionid = AUDIT_SID_UNSET,
 	.cont = NULL,
+	.capcontid = 0,
 #ifdef CONFIG_AUDITSYSCALL
 	.ctx = NULL,
 #endif
@@ -2763,6 +2765,40 @@ static bool audit_contid_isnesting(struct task_struct *tsk)
 	return !isowner && ownerisparent;
 }
 
+int audit_set_capcontid(struct task_struct *task, u32 enable)
+{
+	u32 oldcapcontid;
+	int rc = 0;
+	struct audit_buffer *ab;
+
+	if (!task->audit)
+		return -ENOPROTOOPT;
+	oldcapcontid = audit_get_capcontid(task);
+	/* if task is not descendant, block */
+	if (task == current || !task_is_descendant(current, task))
+		rc = -EXDEV;
+	else if (current_user_ns() == &init_user_ns) {
+		if (!capable(CAP_AUDIT_CONTROL) &&
+		    !audit_get_capcontid(current))
+			rc = -EPERM;
+	}
+	if (!rc)
+		task->audit->capcontid = enable;
+
+	if (!audit_enabled)
+		return rc;
+
+	ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_SET_CAPCONTID);
+	if (!ab)
+		return rc;
+
+	audit_log_format(ab,
+			 "opid=%d capcontid=%u old-capcontid=%u",
+			 task_tgid_nr(task), enable, oldcapcontid);
+	audit_log_end(ab);
+	return rc;
+}
+
 /*
  * audit_set_contid - set current task's audit contid
  * @task: target task
@@ -2795,7 +2831,7 @@ int audit_set_contid(struct task_struct *task, u64 contid)
 		goto unlock;
 	}
 	/* if we don't have caps, reject */
-	if (!capable(CAP_AUDIT_CONTROL)) {
+	if (!capable(CAP_AUDIT_CONTROL) && !audit_get_capcontid(current)) {
 		rc = -EPERM;
 		goto unlock;
 	}
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 02/13] audit: add container id
  2020-06-27 13:20 ` [PATCH ghak90 V9 02/13] audit: add container id Richard Guy Briggs
@ 2020-07-04 13:29   ` Paul Moore
  2020-07-04 13:30     ` Paul Moore
  2020-07-05 15:09   ` Paul Moore
  1 sibling, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-04 13:29 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Implement the proc fs write to set the audit container identifier of a
> process, emitting an AUDIT_CONTAINER_OP record to document the event.
>
> This is a write from the container orchestrator task to a proc entry of
> the form /proc/PID/audit_containerid where PID is the process ID of the
> newly created task that is to become the first task in a container, or
> an additional task added to a container.
>
> The write expects up to a u64 value (unset: 18446744073709551615).
>
> The writer must have capability CAP_AUDIT_CONTROL.
>
> This will produce a record such as this:
>   type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615
>
> The "op" field indicates an initial set.  The "opid" field is the
> object's PID, the process being "contained".  New and old audit
> container identifier values are given in the "contid" fields.
>
> It is not permitted to unset the audit container identifier.
> A child inherits its parent's audit container identifier.
>
> Store the audit container identifier in a refcounted kernel object that
> is added to the master list of audit container identifiers.  This will
> allow multiple container orchestrators/engines to work on the same
> machine without danger of inadvertantly re-using an existing identifier.
> It will also allow an orchestrator to inject a process into an existing
> container by checking if the original container owner is the one
> injecting the task.  A hash table list is used to optimize searches.
>
> Please see the github audit kernel issue for the main feature:
>   https://github.com/linux-audit/audit-kernel/issues/90
> Please see the github audit userspace issue for supporting additions:
>   https://github.com/linux-audit/audit-userspace/issues/51
> Please see the github audit testsuiite issue for the test case:
>   https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
>   https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Serge Hallyn <serge@hallyn.com>
> Acked-by: Steve Grubb <sgrubb@redhat.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  fs/proc/base.c             |  36 +++++++++++
>  include/linux/audit.h      |  33 ++++++++++
>  include/uapi/linux/audit.h |   2 +
>  kernel/audit.c             | 148 +++++++++++++++++++++++++++++++++++++++++++++
>  kernel/audit.h             |   8 +++
>  5 files changed, 227 insertions(+)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index d86c0afc8a85..6c17ab32e71b 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -1317,6 +1317,40 @@ static ssize_t proc_sessionid_read(struct file * file, char __user * buf,
>         .read           = proc_sessionid_read,
>         .llseek         = generic_file_llseek,
>  };
> +
> +static ssize_t proc_contid_write(struct file *file, const char __user *buf,
> +                                  size_t count, loff_t *ppos)
> +{
> +       struct inode *inode = file_inode(file);
> +       u64 contid;
> +       int rv;
> +       struct task_struct *task = get_proc_task(inode);
> +
> +       if (!task)
> +               return -ESRCH;
> +       if (*ppos != 0) {
> +               /* No partial writes. */
> +               put_task_struct(task);
> +               return -EINVAL;
> +       }
> +
> +       rv = kstrtou64_from_user(buf, count, 10, &contid);
> +       if (rv < 0) {
> +               put_task_struct(task);
> +               return rv;
> +       }
> +
> +       rv = audit_set_contid(task, contid);
> +       put_task_struct(task);
> +       if (rv < 0)
> +               return rv;
> +       return count;
> +}
> +
> +static const struct file_operations proc_contid_operations = {
> +       .write          = proc_contid_write,
> +       .llseek         = generic_file_llseek,
> +};
>  #endif
>
>  #ifdef CONFIG_FAULT_INJECTION
> @@ -3219,6 +3253,7 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
>  #ifdef CONFIG_AUDIT
>         REG("loginuid",   S_IWUSR|S_IRUGO, proc_loginuid_operations),
>         REG("sessionid",  S_IRUGO, proc_sessionid_operations),
> +       REG("audit_containerid", S_IWUSR, proc_contid_operations),
>  #endif
>  #ifdef CONFIG_FAULT_INJECTION
>         REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> @@ -3558,6 +3593,7 @@ static int proc_tid_comm_permission(struct inode *inode, int mask)
>  #ifdef CONFIG_AUDIT
>         REG("loginuid",  S_IWUSR|S_IRUGO, proc_loginuid_operations),
>         REG("sessionid",  S_IRUGO, proc_sessionid_operations),
> +       REG("audit_containerid", S_IWUSR, proc_contid_operations),
>  #endif
>  #ifdef CONFIG_FAULT_INJECTION
>         REG("make-it-fail", S_IRUGO|S_IWUSR, proc_fault_inject_operations),
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index c2150415f9df..2800d4f1a2a8 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -100,9 +100,18 @@ enum audit_nfcfgop {
>         AUDIT_XT_OP_UNREGISTER,
>  };
>
> +struct audit_contobj {
> +       struct list_head        list;
> +       u64                     id;
> +       struct task_struct      *owner;
> +       refcount_t              refcount;
> +       struct rcu_head         rcu;
> +};
> +
>  struct audit_task_info {
>         kuid_t                  loginuid;
>         unsigned int            sessionid;
> +       struct audit_contobj    *cont;
>  #ifdef CONFIG_AUDITSYSCALL
>         struct audit_context    *ctx;
>  #endif
> @@ -204,6 +213,15 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
>         return tsk->audit->sessionid;
>  }
>
> +extern int audit_set_contid(struct task_struct *tsk, u64 contid);
> +
> +static inline u64 audit_get_contid(struct task_struct *tsk)
> +{
> +       if (!tsk->audit || !tsk->audit->cont)
> +               return AUDIT_CID_UNSET;
> +       return tsk->audit->cont->id;
> +}
> +
>  extern u32 audit_enabled;
>
>  extern int audit_signal_info(int sig, struct task_struct *t);
> @@ -268,6 +286,11 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
>         return AUDIT_SID_UNSET;
>  }
>
> +static inline u64 audit_get_contid(struct task_struct *tsk)
> +{
> +       return AUDIT_CID_UNSET;
> +}
> +
>  #define audit_enabled AUDIT_OFF
>
>  static inline int audit_signal_info(int sig, struct task_struct *t)
> @@ -692,6 +715,16 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
>         return uid_valid(audit_get_loginuid(tsk));
>  }
>
> +static inline bool audit_contid_valid(u64 contid)
> +{
> +       return contid != AUDIT_CID_UNSET;
> +}
> +
> +static inline bool audit_contid_set(struct task_struct *tsk)
> +{
> +       return audit_contid_valid(audit_get_contid(tsk));
> +}
> +
>  static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
>  {
>         audit_log_n_string(ab, buf, strlen(buf));
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index 9b6a973f4cc3..859382527210 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -71,6 +71,7 @@
>  #define AUDIT_TTY_SET          1017    /* Set TTY auditing status */
>  #define AUDIT_SET_FEATURE      1018    /* Turn an audit feature on or off */
>  #define AUDIT_GET_FEATURE      1019    /* Get which features are enabled */
> +#define AUDIT_CONTAINER_OP     1020    /* Define the container id and info */
>
>  #define AUDIT_FIRST_USER_MSG   1100    /* Userspace messages mostly uninteresting to kernel */
>  #define AUDIT_USER_AVC         1107    /* We filter this differently */
> @@ -491,6 +492,7 @@ struct audit_tty_status {
>
>  #define AUDIT_UID_UNSET (unsigned int)-1
>  #define AUDIT_SID_UNSET ((unsigned int)-1)
> +#define AUDIT_CID_UNSET ((u64)-1)
>
>  /* audit_rule_data supports filter rules with both integer and string
>   * fields.  It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 5d8147a29291..6d387793f702 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -138,6 +138,13 @@ struct auditd_connection {
>
>  /* Hash for inode-based rules */
>  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> +/* Hash for contid object lists */
> +struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> +/* Lock all additions and deletions to the contid hash lists, assignment
> + * of container objects to tasks.  There should be no need for
> + * interaction with tasklist_lock
> + */
> +static DEFINE_SPINLOCK(audit_contobj_list_lock);
>
>  static struct kmem_cache *audit_buffer_cache;
>
> @@ -212,6 +219,33 @@ void __init audit_task_init(void)
>                                              0, SLAB_PANIC, NULL);
>  }
>
> +/* rcu_read_lock must be held by caller unless new */
> +static struct audit_contobj *_audit_contobj_hold(struct audit_contobj *cont)
> +{
> +       if (cont)
> +               refcount_inc(&cont->refcount);
> +       return cont;
> +}
> +
> +static struct audit_contobj *_audit_contobj_get(struct task_struct *tsk)
> +{
> +       if (!tsk->audit)
> +               return NULL;
> +       return _audit_contobj_hold(tsk->audit->cont);
> +}
> +
> +/* rcu_read_lock must be held by caller */
> +static void _audit_contobj_put(struct audit_contobj *cont)
> +{
> +       if (!cont)
> +               return;
> +       if (refcount_dec_and_test(&cont->refcount)) {
> +               put_task_struct(cont->owner);
> +               list_del_rcu(&cont->list);
> +               kfree_rcu(cont, rcu);
> +       }
> +}
> +
>  /**
>   * audit_alloc - allocate an audit info block for a task
>   * @tsk: task
> @@ -232,6 +266,9 @@ int audit_alloc(struct task_struct *tsk)
>         }
>         info->loginuid = audit_get_loginuid(current);
>         info->sessionid = audit_get_sessionid(current);
> +       rcu_read_lock();
> +       info->cont = _audit_contobj_get(current);
> +       rcu_read_unlock();
>         tsk->audit = info;
>
>         ret = audit_alloc_syscall(tsk);
> @@ -246,6 +283,7 @@ int audit_alloc(struct task_struct *tsk)
>  struct audit_task_info init_struct_audit = {
>         .loginuid = INVALID_UID,
>         .sessionid = AUDIT_SID_UNSET,
> +       .cont = NULL,
>  #ifdef CONFIG_AUDITSYSCALL
>         .ctx = NULL,
>  #endif
> @@ -262,6 +300,9 @@ void audit_free(struct task_struct *tsk)
>         struct audit_task_info *info = tsk->audit;
>
>         audit_free_syscall(tsk);
> +       rcu_read_lock();
> +       _audit_contobj_put(tsk->audit->cont);
> +       rcu_read_unlock();
>         /* Freeing the audit_task_info struct must be performed after
>          * audit_log_exit() due to need for loginuid and sessionid.
>          */
> @@ -1709,6 +1750,9 @@ static int __init audit_init(void)
>         for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
>                 INIT_LIST_HEAD(&audit_inode_hash[i]);
>
> +       for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
> +               INIT_LIST_HEAD(&audit_contid_hash[i]);
> +
>         mutex_init(&audit_cmd_mutex.lock);
>         audit_cmd_mutex.owner = NULL;
>
> @@ -2410,6 +2454,110 @@ int audit_signal_info(int sig, struct task_struct *t)
>         return audit_signal_info_syscall(t);
>  }
>
> +/*
> + * audit_set_contid - set current task's audit contid
> + * @task: target task
> + * @contid: contid value
> + *
> + * Returns 0 on success, -EPERM on permission failure.
> + *
> + * If the original container owner goes away, no task injection is
> + * possible to an existing container.
> + *
> + * Called (set) from fs/proc/base.c::proc_contid_write().
> + */
> +int audit_set_contid(struct task_struct *task, u64 contid)
> +{
> +       int rc = 0;
> +       struct audit_buffer *ab;
> +       struct audit_contobj *oldcont = NULL;
> +
> +       task_lock(task);
> +       /* Can't set if audit disabled */
> +       if (!task->audit) {
> +               task_unlock(task);
> +               return -ENOPROTOOPT;
> +       }
> +       read_lock(&tasklist_lock);
> +       /* Don't allow the contid to be unset */
> +       if (!audit_contid_valid(contid)) {
> +               rc = -EINVAL;
> +               goto unlock;
> +       }
> +       /* if we don't have caps, reject */
> +       if (!capable(CAP_AUDIT_CONTROL)) {
> +               rc = -EPERM;
> +               goto unlock;
> +       }
> +       /* if task has children or is not single-threaded, deny */
> +       if (!list_empty(&task->children) ||
> +           !(thread_group_leader(task) && thread_group_empty(task))) {
> +               rc = -EBUSY;
> +               goto unlock;
> +       }
> +       /* if contid is already set, deny */
> +       if (audit_contid_set(task))
> +               rc = -EEXIST;
> +unlock:
> +       read_unlock(&tasklist_lock);
> +       rcu_read_lock();
> +       oldcont = _audit_contobj_get(task);
> +       if (!rc) {
> +               struct audit_contobj *cont = NULL, *newcont = NULL;
> +               int h = audit_hash_contid(contid);
> +
> +               spin_lock(&audit_contobj_list_lock);
> +               list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> +                       if (cont->id == contid) {
> +                               /* task injection to existing container */
> +                               if (current == cont->owner) {
> +                                       _audit_contobj_hold(cont);
> +                                       newcont = cont;
> +                               } else {
> +                                       rc = -ENOTUNIQ;
> +                                       spin_unlock(&audit_contobj_list_lock);
> +                                       goto conterror;
> +                               }
> +                               break;
> +                       }
> +               if (!newcont) {
> +                       newcont = kmalloc(sizeof(*newcont), GFP_ATOMIC);
> +                       if (newcont) {
> +                               INIT_LIST_HEAD(&newcont->list);
> +                               newcont->id = contid;
> +                               newcont->owner = get_task_struct(current);
> +                               refcount_set(&newcont->refcount, 1);
> +                               list_add_rcu(&newcont->list,
> +                                            &audit_contid_hash[h]);
> +                       } else {
> +                               rc = -ENOMEM;
> +                               spin_unlock(&audit_contobj_list_lock);
> +                               goto conterror;
> +                       }
> +               }
> +               spin_unlock(&audit_contobj_list_lock);
> +               task->audit->cont = newcont;
> +               _audit_contobj_put(oldcont);
> +       }
> +conterror:
> +       task_unlock(task);
> +
> +       if (!audit_enabled)
> +               return rc;
> +
> +       ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
> +       if (!ab)
> +               return rc;
> +
> +       audit_log_format(ab,
> +                        "op=set opid=%d contid=%llu old-contid=%llu",
> +                        task_tgid_nr(task), contid, oldcont ? oldcont->id : -1);
> +       _audit_contobj_put(oldcont);
> +       rcu_read_unlock();
> +       audit_log_end(ab);
> +       return rc;
> +}
> +
>  /**
>   * audit_log_end - end one audit record
>   * @ab: the audit_buffer
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 9bee09757068..182fc76ea276 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -210,6 +210,14 @@ static inline int audit_hash_ino(u32 ino)
>         return (ino & (AUDIT_INODE_BUCKETS-1));
>  }
>
> +#define AUDIT_CONTID_BUCKETS   32
> +extern struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> +
> +static inline int audit_hash_contid(u64 contid)
> +{
> +       return (contid & (AUDIT_CONTID_BUCKETS-1));
> +}
> +
>  /* Indicates that audit should log the full pathname. */
>  #define AUDIT_NAME_FULL -1
>
> --
> 1.8.3.1
>


-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 02/13] audit: add container id
  2020-07-04 13:29   ` Paul Moore
@ 2020-07-04 13:30     ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-07-04 13:30 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jul 4, 2020 at 9:29 AM Paul Moore <paul@paul-moore.com> wrote:
> On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > Implement the proc fs write to set the audit container identifier of a
> > process, emitting an AUDIT_CONTAINER_OP record to document the event.

Sorry about the email misfire, you can safely ignore that last empty message.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 01/13] audit: collect audit task parameters
  2020-06-27 13:20 ` [PATCH ghak90 V9 01/13] audit: collect audit task parameters Richard Guy Briggs
@ 2020-07-05 15:09   ` Paul Moore
  2020-07-07  2:50     ` Richard Guy Briggs
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:09 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:21 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> The audit-related parameters in struct task_struct should ideally be
> collected together and accessed through a standard audit API.
>
> Collect the existing loginuid, sessionid and audit_context together in a
> new struct audit_task_info called "audit" in struct task_struct.
>
> Use kmem_cache to manage this pool of memory.
> Un-inline audit_free() to be able to always recover that memory.
>
> Please see the upstream github issue
> https://github.com/linux-audit/audit-kernel/issues/81
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  include/linux/audit.h | 49 +++++++++++++++++++++++------------
>  include/linux/sched.h |  7 +----
>  init/init_task.c      |  3 +--
>  init/main.c           |  2 ++
>  kernel/audit.c        | 71 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  kernel/audit.h        |  5 ++++
>  kernel/auditsc.c      | 26 ++++++++++---------
>  kernel/fork.c         |  1 -
>  8 files changed, 124 insertions(+), 40 deletions(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 3fcd9ee49734..c2150415f9df 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -100,6 +100,16 @@ enum audit_nfcfgop {
>         AUDIT_XT_OP_UNREGISTER,
>  };
>
> +struct audit_task_info {
> +       kuid_t                  loginuid;
> +       unsigned int            sessionid;
> +#ifdef CONFIG_AUDITSYSCALL
> +       struct audit_context    *ctx;
> +#endif
> +};
> +
> +extern struct audit_task_info init_struct_audit;
> +
>  extern int is_audit_feature_set(int which);
>
>  extern int __init audit_register_class(int class, unsigned *list);

...

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b62e6aaf28f0..2213ac670386 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -34,7 +34,6 @@
>  #include <linux/kcsan.h>
>
>  /* task_struct member predeclarations (sorted alphabetically): */
> -struct audit_context;
>  struct backing_dev_info;
>  struct bio_list;
>  struct blk_plug;
> @@ -946,11 +945,7 @@ struct task_struct {
>         struct callback_head            *task_works;
>
>  #ifdef CONFIG_AUDIT
> -#ifdef CONFIG_AUDITSYSCALL
> -       struct audit_context            *audit_context;
> -#endif
> -       kuid_t                          loginuid;
> -       unsigned int                    sessionid;
> +       struct audit_task_info          *audit;
>  #endif
>         struct seccomp                  seccomp;

In the early days of this patchset we talked a lot about how to handle
the task_struct and the changes that would be necessary, ultimately
deciding that encapsulating all of the audit fields into an
audit_task_info struct.  However, what is puzzling me a bit at this
moment is why we are only including audit_task_info in task_info by
reference *and* making it a build time conditional (via CONFIG_AUDIT).

If audit is enabled at build time it would seem that we are always
going to allocate an audit_task_info struct, so I have to wonder why
we don't simply embed it inside the task_info struct (similar to the
seccomp struct in the snippet above?  Of course the audit_context
struct needs to remain as is, I'm talking only about the
task_info/audit_task_info struct.

Richard, I'm sure you can answer this off the top of your head, but
I'd have to go digging through the archives to pull out the relevant
discussions so I figured I would just ask you for a reminder ... ?  I
imagine it's also possible things have changed a bit since those early
discussions and the solution we arrived at then no longer makes as
much sense as it did before.

> diff --git a/init/init_task.c b/init/init_task.c
> index 15089d15010a..92d34c4b7702 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -130,8 +130,7 @@ struct task_struct init_task
>         .thread_group   = LIST_HEAD_INIT(init_task.thread_group),
>         .thread_node    = LIST_HEAD_INIT(init_signals.thread_head),
>  #ifdef CONFIG_AUDIT
> -       .loginuid       = INVALID_UID,
> -       .sessionid      = AUDIT_SID_UNSET,
> +       .audit          = &init_struct_audit,
>  #endif
>  #ifdef CONFIG_PERF_EVENTS
>         .perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
> diff --git a/init/main.c b/init/main.c
> index 0ead83e86b5a..349470ad7458 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -96,6 +96,7 @@
>  #include <linux/jump_label.h>
>  #include <linux/mem_encrypt.h>
>  #include <linux/kcsan.h>
> +#include <linux/audit.h>
>
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -1028,6 +1029,7 @@ asmlinkage __visible void __init start_kernel(void)
>         nsfs_init();
>         cpuset_init();
>         cgroup_init();
> +       audit_task_init();
>         taskstats_init_early();
>         delayacct_init();
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 8c201f414226..5d8147a29291 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -203,6 +203,73 @@ struct audit_reply {
>         struct sk_buff *skb;
>  };
>
> +static struct kmem_cache *audit_task_cache;
> +
> +void __init audit_task_init(void)
> +{
> +       audit_task_cache = kmem_cache_create("audit_task",
> +                                            sizeof(struct audit_task_info),
> +                                            0, SLAB_PANIC, NULL);
> +}
> +
> +/**
> + * audit_alloc - allocate an audit info block for a task
> + * @tsk: task
> + *
> + * Call audit_alloc_syscall to filter on the task information and
> + * allocate a per-task audit context if necessary.  This is called from
> + * copy_process, so no lock is needed.
> + */
> +int audit_alloc(struct task_struct *tsk)
> +{
> +       int ret = 0;
> +       struct audit_task_info *info;
> +
> +       info = kmem_cache_alloc(audit_task_cache, GFP_KERNEL);
> +       if (!info) {
> +               ret = -ENOMEM;
> +               goto out;
> +       }
> +       info->loginuid = audit_get_loginuid(current);
> +       info->sessionid = audit_get_sessionid(current);
> +       tsk->audit = info;
> +
> +       ret = audit_alloc_syscall(tsk);
> +       if (ret) {
> +               tsk->audit = NULL;
> +               kmem_cache_free(audit_task_cache, info);
> +       }
> +out:
> +       return ret;
> +}

This is a big nitpick, and I'm only mentioning this in the case you
need to respin this patchset: the "out" label is unnecessary in the
function above.  Simply return the error code, there is no need to
jump to "out" only to immediately return the error code there and
nothing more.

> +struct audit_task_info init_struct_audit = {
> +       .loginuid = INVALID_UID,
> +       .sessionid = AUDIT_SID_UNSET,
> +#ifdef CONFIG_AUDITSYSCALL
> +       .ctx = NULL,
> +#endif
> +};
> +
> +/**
> + * audit_free - free per-task audit info
> + * @tsk: task whose audit info block to free
> + *
> + * Called from copy_process and do_exit
> + */
> +void audit_free(struct task_struct *tsk)
> +{
> +       struct audit_task_info *info = tsk->audit;
> +
> +       audit_free_syscall(tsk);
> +       /* Freeing the audit_task_info struct must be performed after
> +        * audit_log_exit() due to need for loginuid and sessionid.
> +        */
> +       info = tsk->audit;
> +       tsk->audit = NULL;
> +       kmem_cache_free(audit_task_cache, info);

Another nitpick, and this one may even become a moot point given the
question posed above.  However, is there any reason we couldn't get
rid of "info" and simplify this a bit?

  audit_free_syscall(tsk);
  kmem_cache_free(audit_task_cache, tsk->audit);
  tsk->audit = NULL;

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 468a23390457..f00c1da587ea 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -1612,7 +1615,6 @@ void __audit_free(struct task_struct *tsk)
>                 if (context->current_state == AUDIT_RECORD_CONTEXT)
>                         audit_log_exit();
>         }
> -
>         audit_set_context(tsk, NULL);
>         audit_free_context(context);
>  }

This nitpick is barely worth the time it is taking me to write this,
but the whitespace change above isn't strictly necessary.


--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 02/13] audit: add container id
  2020-06-27 13:20 ` [PATCH ghak90 V9 02/13] audit: add container id Richard Guy Briggs
  2020-07-04 13:29   ` Paul Moore
@ 2020-07-05 15:09   ` Paul Moore
  2020-07-29 20:05     ` Richard Guy Briggs
  1 sibling, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:09 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Implement the proc fs write to set the audit container identifier of a
> process, emitting an AUDIT_CONTAINER_OP record to document the event.
>
> This is a write from the container orchestrator task to a proc entry of
> the form /proc/PID/audit_containerid where PID is the process ID of the
> newly created task that is to become the first task in a container, or
> an additional task added to a container.
>
> The write expects up to a u64 value (unset: 18446744073709551615).
>
> The writer must have capability CAP_AUDIT_CONTROL.
>
> This will produce a record such as this:
>   type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615
>
> The "op" field indicates an initial set.  The "opid" field is the
> object's PID, the process being "contained".  New and old audit
> container identifier values are given in the "contid" fields.
>
> It is not permitted to unset the audit container identifier.
> A child inherits its parent's audit container identifier.
>
> Store the audit container identifier in a refcounted kernel object that
> is added to the master list of audit container identifiers.  This will
> allow multiple container orchestrators/engines to work on the same
> machine without danger of inadvertantly re-using an existing identifier.
> It will also allow an orchestrator to inject a process into an existing
> container by checking if the original container owner is the one
> injecting the task.  A hash table list is used to optimize searches.
>
> Please see the github audit kernel issue for the main feature:
>   https://github.com/linux-audit/audit-kernel/issues/90
> Please see the github audit userspace issue for supporting additions:
>   https://github.com/linux-audit/audit-userspace/issues/51
> Please see the github audit testsuiite issue for the test case:
>   https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
>   https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Serge Hallyn <serge@hallyn.com>
> Acked-by: Steve Grubb <sgrubb@redhat.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  fs/proc/base.c             |  36 +++++++++++
>  include/linux/audit.h      |  33 ++++++++++
>  include/uapi/linux/audit.h |   2 +
>  kernel/audit.c             | 148 +++++++++++++++++++++++++++++++++++++++++++++
>  kernel/audit.h             |   8 +++
>  5 files changed, 227 insertions(+)

...

> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index c2150415f9df..2800d4f1a2a8 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -692,6 +715,16 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
>         return uid_valid(audit_get_loginuid(tsk));
>  }
>
> +static inline bool audit_contid_valid(u64 contid)
> +{
> +       return contid != AUDIT_CID_UNSET;
> +}
> +
> +static inline bool audit_contid_set(struct task_struct *tsk)
> +{
> +       return audit_contid_valid(audit_get_contid(tsk));
> +}

This is quasi-nitpicky, but it seems like audit_contid_valid() and
audit_contid_set() should be moved to kernel/audit.h if possible
(possibly even kernel/audit.c).  Maybe I'll see something later in the
patchset, but right now I'm struggling to think of why anyone outside
of audit would need to call these functions.

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 5d8147a29291..6d387793f702 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -138,6 +138,13 @@ struct auditd_connection {
>
>  /* Hash for inode-based rules */
>  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> +/* Hash for contid object lists */
> +struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> +/* Lock all additions and deletions to the contid hash lists, assignment
> + * of container objects to tasks.  There should be no need for
> + * interaction with tasklist_lock
> + */
> +static DEFINE_SPINLOCK(audit_contobj_list_lock);
>
>  static struct kmem_cache *audit_buffer_cache;
>
> @@ -212,6 +219,33 @@ void __init audit_task_init(void)
>                                              0, SLAB_PANIC, NULL);
>  }
>
> +/* rcu_read_lock must be held by caller unless new */
> +static struct audit_contobj *_audit_contobj_hold(struct audit_contobj *cont)
> +{
> +       if (cont)
> +               refcount_inc(&cont->refcount);
> +       return cont;
> +}
> +
> +static struct audit_contobj *_audit_contobj_get(struct task_struct *tsk)
> +{
> +       if (!tsk->audit)
> +               return NULL;
> +       return _audit_contobj_hold(tsk->audit->cont);
> +}
> +
> +/* rcu_read_lock must be held by caller */
> +static void _audit_contobj_put(struct audit_contobj *cont)
> +{
> +       if (!cont)
> +               return;
> +       if (refcount_dec_and_test(&cont->refcount)) {
> +               put_task_struct(cont->owner);
> +               list_del_rcu(&cont->list);

You should check your locking; I'm used to seeing exclusive locks
(e.g. the spinlock) around list adds/removes, it just reads/traversals
that can be done with just the RCU lock held.

> +               kfree_rcu(cont, rcu);
> +       }
> +}

Another nitpick, but it might be nice to have similar arguments to the
_get() and _put() functions, e.g. struct audit_contobj, but that is
some serious bikeshedding (basically rename _hold() to _get() and
rename _hold to audit_task_contid_hold() or similar).

>  /**
>   * audit_alloc - allocate an audit info block for a task
>   * @tsk: task
> @@ -232,6 +266,9 @@ int audit_alloc(struct task_struct *tsk)
>         }
>         info->loginuid = audit_get_loginuid(current);
>         info->sessionid = audit_get_sessionid(current);
> +       rcu_read_lock();
> +       info->cont = _audit_contobj_get(current);
> +       rcu_read_unlock();

The RCU locks aren't strictly necessary here, are they?  In fact I
suppose we could probably just replace the _get() call with a
refcount_set(1) just as we do in audit_set_contid(), yes?

>         tsk->audit = info;
>
>         ret = audit_alloc_syscall(tsk);
> @@ -246,6 +283,7 @@ int audit_alloc(struct task_struct *tsk)
>  struct audit_task_info init_struct_audit = {
>         .loginuid = INVALID_UID,
>         .sessionid = AUDIT_SID_UNSET,
> +       .cont = NULL,
>  #ifdef CONFIG_AUDITSYSCALL
>         .ctx = NULL,
>  #endif
> @@ -262,6 +300,9 @@ void audit_free(struct task_struct *tsk)
>         struct audit_task_info *info = tsk->audit;
>
>         audit_free_syscall(tsk);
> +       rcu_read_lock();
> +       _audit_contobj_put(tsk->audit->cont);
> +       rcu_read_unlock();
>         /* Freeing the audit_task_info struct must be performed after
>          * audit_log_exit() due to need for loginuid and sessionid.
>          */
> @@ -1709,6 +1750,9 @@ static int __init audit_init(void)
>         for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
>                 INIT_LIST_HEAD(&audit_inode_hash[i]);
>
> +       for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
> +               INIT_LIST_HEAD(&audit_contid_hash[i]);
> +
>         mutex_init(&audit_cmd_mutex.lock);
>         audit_cmd_mutex.owner = NULL;
>
> @@ -2410,6 +2454,110 @@ int audit_signal_info(int sig, struct task_struct *t)
>         return audit_signal_info_syscall(t);
>  }
>
> +/*
> + * audit_set_contid - set current task's audit contid
> + * @task: target task
> + * @contid: contid value
> + *
> + * Returns 0 on success, -EPERM on permission failure.
> + *
> + * If the original container owner goes away, no task injection is
> + * possible to an existing container.
> + *
> + * Called (set) from fs/proc/base.c::proc_contid_write().
> + */
> +int audit_set_contid(struct task_struct *task, u64 contid)
> +{
> +       int rc = 0;
> +       struct audit_buffer *ab;
> +       struct audit_contobj *oldcont = NULL;
> +
> +       task_lock(task);
> +       /* Can't set if audit disabled */
> +       if (!task->audit) {
> +               task_unlock(task);
> +               return -ENOPROTOOPT;
> +       }

See my question/comment in patch 1/13; this check may not be needed or
it may need to be changed to something other than "!task->audit".

> +       read_lock(&tasklist_lock);
> +       /* Don't allow the contid to be unset */
> +       if (!audit_contid_valid(contid)) {
> +               rc = -EINVAL;
> +               goto unlock;
> +       }
> +       /* if we don't have caps, reject */
> +       if (!capable(CAP_AUDIT_CONTROL)) {
> +               rc = -EPERM;
> +               goto unlock;
> +       }
> +       /* if task has children or is not single-threaded, deny */
> +       if (!list_empty(&task->children) ||
> +           !(thread_group_leader(task) && thread_group_empty(task))) {
> +               rc = -EBUSY;
> +               goto unlock;
> +       }
> +       /* if contid is already set, deny */
> +       if (audit_contid_set(task))
> +               rc = -EEXIST;
> +unlock:

Can we move the "unlock" target to the end of the function where it
just handles the unlocking and returns an error, including the
AUDIT_CONTAINER_OP record if necessary?  From what I can see we only
jump to "unlock" in case of error where we are not going to set the
audit container ID, yet the "unlock" target is placed in a misleading
location in the middle of the function.  It may be that everything
works correctly, but I would argue this is a bad practice that
increases the likelihood of buggy behavior in future code changes.

If you can't find way to arrange the code nicely, just duplicate the
"tasklist_lock" unlock operation in the error handlers before jumping
down to the end of the function.  It isn't perfect, but I believe it
will be a lot less fragile than the current approach.


> +       read_unlock(&tasklist_lock);
> +       rcu_read_lock();
> +       oldcont = _audit_contobj_get(task);
> +       if (!rc) {
> +               struct audit_contobj *cont = NULL, *newcont = NULL;
> +               int h = audit_hash_contid(contid);
> +
> +               spin_lock(&audit_contobj_list_lock);
> +               list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> +                       if (cont->id == contid) {
> +                               /* task injection to existing container */
> +                               if (current == cont->owner) {
> +                                       _audit_contobj_hold(cont);
> +                                       newcont = cont;
> +                               } else {
> +                                       rc = -ENOTUNIQ;
> +                                       spin_unlock(&audit_contobj_list_lock);
> +                                       goto conterror;
> +                               }
> +                               break;
> +                       }
> +               if (!newcont) {
> +                       newcont = kmalloc(sizeof(*newcont), GFP_ATOMIC);
> +                       if (newcont) {
> +                               INIT_LIST_HEAD(&newcont->list);
> +                               newcont->id = contid;
> +                               newcont->owner = get_task_struct(current);
> +                               refcount_set(&newcont->refcount, 1);
> +                               list_add_rcu(&newcont->list,
> +                                            &audit_contid_hash[h]);
> +                       } else {
> +                               rc = -ENOMEM;
> +                               spin_unlock(&audit_contobj_list_lock);
> +                               goto conterror;
> +                       }
> +               }
> +               spin_unlock(&audit_contobj_list_lock);
> +               task->audit->cont = newcont;
> +               _audit_contobj_put(oldcont);
> +       }
> +conterror:
> +       task_unlock(task);
> +
> +       if (!audit_enabled)
> +               return rc;
> +
> +       ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
> +       if (!ab)
> +               return rc;
> +
> +       audit_log_format(ab,
> +                        "op=set opid=%d contid=%llu old-contid=%llu",
> +                        task_tgid_nr(task), contid, oldcont ? oldcont->id : -1);
> +       _audit_contobj_put(oldcont);
> +       rcu_read_unlock();
> +       audit_log_end(ab);
> +       return rc;
> +}

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 04/13] audit: log drop of contid on exit of last task
  2020-06-27 13:20 ` [PATCH ghak90 V9 04/13] audit: log drop of contid on exit of last task Richard Guy Briggs
@ 2020-07-05 15:10   ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:10 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Since we are tracking the life of each audit container indentifier, we
> can match the creation event with the destruction event.  Log the
> destruction of the audit container identifier when the last process in
> that container exits.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  kernel/audit.c   | 20 ++++++++++++++++++++
>  kernel/audit.h   |  2 ++
>  kernel/auditsc.c |  2 ++
>  3 files changed, 24 insertions(+)

If you end up respinning this patchset it seems like this should be
merged in with patch 2/13.  This way patch 2/13 would include both the
"set" and "drop" records, making that patch a bit more useful on it's
own.

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 6d387793f702..9e0b38ce1ead 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2558,6 +2558,26 @@ int audit_set_contid(struct task_struct *task, u64 contid)
>         return rc;
>  }
>
> +void audit_log_container_drop(void)
> +{
> +       struct audit_buffer *ab;
> +       struct audit_contobj *cont;
> +
> +       rcu_read_lock();
> +       cont = _audit_contobj_get(current);
> +       _audit_contobj_put(cont);
> +       if (!cont || refcount_read(&cont->refcount) > 1)
> +               goto out;
> +       ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);

You may want to check on sleeping with RCU locks held, or just use
GFP_ATOMIC to be safe.


> +       if (!ab)
> +               goto out;
> +       audit_log_format(ab, "op=drop opid=%d contid=%llu old-contid=%llu",
> +                        task_tgid_nr(current), cont->id, cont->id);
> +       audit_log_end(ab);
> +out:
> +       rcu_read_unlock();
> +}
> +
>  /**
>   * audit_log_end - end one audit record
>   * @ab: the audit_buffer
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 182fc76ea276..d07093903008 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -254,6 +254,8 @@ extern void audit_log_d_path_exe(struct audit_buffer *ab,
>  extern struct tty_struct *audit_get_tty(void);
>  extern void audit_put_tty(struct tty_struct *tty);
>
> +extern void audit_log_container_drop(void);
> +
>  /* audit watch/mark/tree functions */
>  #ifdef CONFIG_AUDITSYSCALL
>  extern unsigned int audit_serial(void);
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index f00c1da587ea..f03d3eb0752c 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -1575,6 +1575,8 @@ static void audit_log_exit(void)
>
>         audit_log_proctitle();
>
> +       audit_log_container_drop();
> +
>         /* Send end of event record to help user space know we are finished */
>         ab = audit_log_start(context, GFP_KERNEL, AUDIT_EOE);
>         if (ab)

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 05/13] audit: log container info of syscalls
  2020-06-27 13:20 ` [PATCH ghak90 V9 05/13] audit: log container info of syscalls Richard Guy Briggs
@ 2020-07-05 15:10   ` Paul Moore
  2020-07-29 19:40     ` Richard Guy Briggs
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:10 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Create a new audit record AUDIT_CONTAINER_ID to document the audit
> container identifier of a process if it is present.
>
> Called from audit_log_exit(), syscalls are covered.
>
> Include target_cid references from ptrace and signal.
>
> A sample raw event:
> type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
> type=CWD msg=audit(1519924845.499:257): cwd="/root"
> type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
> type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458
>
> Please see the github audit kernel issue for the main feature:
>   https://github.com/linux-audit/audit-kernel/issues/90
> Please see the github audit userspace issue for supporting additions:
>   https://github.com/linux-audit/audit-userspace/issues/51
> Please see the github audit testsuiite issue for the test case:
>   https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
>   https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Serge Hallyn <serge@hallyn.com>
> Acked-by: Steve Grubb <sgrubb@redhat.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  include/linux/audit.h      |  7 +++++++
>  include/uapi/linux/audit.h |  1 +
>  kernel/audit.c             | 25 +++++++++++++++++++++++--
>  kernel/audit.h             |  4 ++++
>  kernel/auditsc.c           | 45 +++++++++++++++++++++++++++++++++++++++------
>  5 files changed, 74 insertions(+), 8 deletions(-)

...

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 9e0b38ce1ead..a09f8f661234 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2211,6 +2211,27 @@ void audit_log_session_info(struct audit_buffer *ab)
>         audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
>  }
>
> +/*
> + * audit_log_container_id - report container info
> + * @context: task or local context for record
> + * @cont: container object to report
> + */
> +void audit_log_container_id(struct audit_context *context,
> +                           struct audit_contobj *cont)
> +{
> +       struct audit_buffer *ab;
> +
> +       if (!cont)
> +               return;
> +       /* Generate AUDIT_CONTAINER_ID record with container ID */
> +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
> +       if (!ab)
> +               return;
> +       audit_log_format(ab, "contid=%llu", contid);

Did this patch compile?  Where is "contid" coming from?  I'm guessing
you mean to get it from "cont", but that isn't what appears to be
happening; likely a casualty of the object vs token discussion we had
during the last review cycle.

I'm assuming this code gets modified later in this patchset and you
only compiled tested the patchset as a whole.  Please make sure the
patchset compiles at each patch along the way to applying them all;
this helps ensure that git bisect remains useful and it fits better
with the general idea that individual patches must have merit on their
own.

... and yes, I do check for this when merging patchsets, it isn't just
a visual inspection, I compile test each patch.

If nothing else, at least this answers the question of if it is worth
respinning or not (this alone requires a respin).

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index f03d3eb0752c..9e79645e5c0e 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -1458,6 +1466,7 @@ static void audit_log_exit(void)
>         struct audit_buffer *ab;
>         struct audit_aux_data *aux;
>         struct audit_names *n;
> +       struct audit_contobj *cont;
>
>         context->personality = current->personality;
>
> @@ -1541,7 +1550,7 @@ static void audit_log_exit(void)
>         for (aux = context->aux_pids; aux; aux = aux->next) {
>                 struct audit_aux_data_pids *axs = (void *)aux;
>
> -               for (i = 0; i < axs->pid_count; i++)
> +               for (i = 0; i < axs->pid_count; i++) {
>                         if (audit_log_pid_context(context, axs->target_pid[i],
>                                                   axs->target_auid[i],
>                                                   axs->target_uid[i],
> @@ -1549,14 +1558,20 @@ static void audit_log_exit(void)
>                                                   axs->target_sid[i],
>                                                   axs->target_comm[i]))
>                                 call_panic = 1;
> +                       audit_log_container_id(context, axs->target_cid[i]);
> +               }

It might be nice to see an audit event example including the
ptrace/signal information.  I'm concerned there may be some confusion
about associating the different audit container IDs with the correct
information in the event.

>         }
>
> -       if (context->target_pid &&
> -           audit_log_pid_context(context, context->target_pid,
> -                                 context->target_auid, context->target_uid,
> -                                 context->target_sessionid,
> -                                 context->target_sid, context->target_comm))
> +       if (context->target_pid) {
> +               if (audit_log_pid_context(context, context->target_pid,
> +                                         context->target_auid,
> +                                         context->target_uid,
> +                                         context->target_sessionid,
> +                                         context->target_sid,
> +                                         context->target_comm))
>                         call_panic = 1;
> +               audit_log_container_id(context, context->target_cid);
> +       }
>
>         if (context->pwd.dentry && context->pwd.mnt) {
>                 ab = audit_log_start(context, GFP_KERNEL, AUDIT_CWD);
> @@ -1575,6 +1590,14 @@ static void audit_log_exit(void)
>
>         audit_log_proctitle();
>
> +       rcu_read_lock();
> +       cont = _audit_contobj_get(current);
> +       rcu_read_unlock();
> +       audit_log_container_id(context, cont);
> +       rcu_read_lock();
> +       _audit_contobj_put(cont);
> +       rcu_read_unlock();

Do we need to grab an additional reference for the audit container
object here?  We don't create any additional references here that
persist beyond the lifetime of this function, right?


>         audit_log_container_drop();
>
>         /* Send end of event record to help user space know we are finished */

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 06/13] audit: add contid support for signalling the audit daemon
  2020-06-27 13:20 ` [PATCH ghak90 V9 06/13] audit: add contid support for signalling the audit daemon Richard Guy Briggs
@ 2020-07-05 15:10   ` Paul Moore
  2020-07-29 19:00     ` Richard Guy Briggs
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:10 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Add audit container identifier support to the action of signalling the
> audit daemon.
>
> Since this would need to add an element to the audit_sig_info struct,
> a new record type AUDIT_SIGNAL_INFO2 was created with a new
> audit_sig_info2 struct.  Corresponding support is required in the
> userspace code to reflect the new record request and reply type.
> An older userspace won't break since it won't know to request this
> record type.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h       |  8 ++++
>  include/uapi/linux/audit.h  |  1 +
>  kernel/audit.c              | 95 ++++++++++++++++++++++++++++++++++++++++++++-
>  security/selinux/nlmsgtab.c |  1 +
>  4 files changed, 104 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 5eeba0efffc2..89cf7c66abe6 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -22,6 +22,13 @@ struct audit_sig_info {
>         char            ctx[];
>  };
>
> +struct audit_sig_info2 {
> +       uid_t           uid;
> +       pid_t           pid;
> +       u32             cid_len;
> +       char            data[];
> +};
> +
>  struct audit_buffer;
>  struct audit_context;
>  struct inode;
> @@ -105,6 +112,7 @@ struct audit_contobj {
>         u64                     id;
>         struct task_struct      *owner;
>         refcount_t              refcount;
> +       refcount_t              sigflag;
>         struct rcu_head         rcu;
>  };

It seems like we need some protection in audit_set_contid() so that we
don't allow reuse of an audit container ID when "refcount == 0 &&
sigflag != 0", yes?

> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index fd98460c983f..a56ad77069b9 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -72,6 +72,7 @@
>  #define AUDIT_SET_FEATURE      1018    /* Turn an audit feature on or off */
>  #define AUDIT_GET_FEATURE      1019    /* Get which features are enabled */
>  #define AUDIT_CONTAINER_OP     1020    /* Define the container id and info */
> +#define AUDIT_SIGNAL_INFO2     1021    /* Get info auditd signal sender */
>
>  #define AUDIT_FIRST_USER_MSG   1100    /* Userspace messages mostly uninteresting to kernel */
>  #define AUDIT_USER_AVC         1107    /* We filter this differently */
> diff --git a/kernel/audit.c b/kernel/audit.c
> index a09f8f661234..54dd2cb69402 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -126,6 +126,8 @@ struct auditd_connection {
>  kuid_t         audit_sig_uid = INVALID_UID;
>  pid_t          audit_sig_pid = -1;
>  u32            audit_sig_sid = 0;
> +static struct audit_contobj *audit_sig_cid;
> +static struct task_struct *audit_sig_atsk;

This looks like a typo, or did you mean "atsk" for some reason?

>  /* Records can be lost in several ways:
>     0) [suppressed in audit_alloc]
> @@ -239,7 +241,33 @@ void _audit_contobj_put(struct audit_contobj *cont)
>  {
>         if (!cont)
>                 return;
> -       if (refcount_dec_and_test(&cont->refcount)) {
> +       if (refcount_dec_and_test(&cont->refcount) && !refcount_read(&cont->sigflag)) {
> +               put_task_struct(cont->owner);
> +               list_del_rcu(&cont->list);
> +               kfree_rcu(cont, rcu);
> +       }
> +}

It seems like it might be a good idea to modify the corresponding
_get() to WARN on the reuse of audit container objects where refcount
is zero, similar to the comment I made above.  What do you think?

> +/* rcu_read_lock must be held by caller unless new */
> +static struct audit_contobj *_audit_contobj_get_sig(struct task_struct *tsk)
> +{
> +       struct audit_contobj *cont;
> +
> +       if (!tsk->audit)
> +               return NULL;
> +       cont = tsk->audit->cont;
> +       if (cont)
> +               refcount_set(&cont->sigflag, 1);
> +       return cont;
> +}

If you are going to use a refcount and call this a "get" function you
might as well make it do an increment and not just a set(1).  It a bit
silly with just one auditd per system, but I suppose it will make more
sense when we have multiple audit daemons.  In a related comment, you
probably want to rename "sigflag" to "sigcount" or similar.

In summary, it's either a reference that supports multiple gets/puts
or it's a flag with just an on/off; it shouldn't attempt to straddle
both, that's both confusing and fragile.

> +/* rcu_read_lock must be held by caller */
> +static void _audit_contobj_put_sig(struct audit_contobj *cont)
> +{
> +       if (!cont)
> +               return;
> +       refcount_set(&cont->sigflag, 0);
> +       if (!refcount_read(&cont->refcount)) {
>                 put_task_struct(cont->owner);
>                 list_del_rcu(&cont->list);
>                 kfree_rcu(cont, rcu);
> @@ -309,6 +337,13 @@ void audit_free(struct task_struct *tsk)
>         info = tsk->audit;
>         tsk->audit = NULL;
>         kmem_cache_free(audit_task_cache, info);
> +       rcu_read_lock();
> +       if (audit_sig_atsk == tsk) {
> +               _audit_contobj_put_sig(audit_sig_cid);
> +               audit_sig_cid = NULL;
> +               audit_sig_atsk = NULL;
> +       }
> +       rcu_read_unlock();
>  }
>
>  /**
> @@ -1132,6 +1167,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
>         case AUDIT_ADD_RULE:
>         case AUDIT_DEL_RULE:
>         case AUDIT_SIGNAL_INFO:
> +       case AUDIT_SIGNAL_INFO2:
>         case AUDIT_TTY_GET:
>         case AUDIT_TTY_SET:
>         case AUDIT_TRIM:
> @@ -1294,6 +1330,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>         struct audit_buffer     *ab;
>         u16                     msg_type = nlh->nlmsg_type;
>         struct audit_sig_info   *sig_data;
> +       struct audit_sig_info2  *sig_data2;
>         char                    *ctx = NULL;
>         u32                     len;
>
> @@ -1559,6 +1596,52 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                                  sig_data, sizeof(*sig_data) + len);
>                 kfree(sig_data);
>                 break;
> +       case AUDIT_SIGNAL_INFO2: {
> +               unsigned int contidstrlen = 0;
> +
> +               len = 0;
> +               if (audit_sig_sid) {
> +                       err = security_secid_to_secctx(audit_sig_sid, &ctx,
> +                                                      &len);
> +                       if (err)
> +                               return err;
> +               }
> +               if (audit_sig_cid) {
> +                       contidstr = kmalloc(21, GFP_KERNEL);
> +                       if (!contidstr) {
> +                               if (audit_sig_sid)
> +                                       security_release_secctx(ctx, len);
> +                               return -ENOMEM;
> +                       }
> +                       contidstrlen = scnprintf(contidstr, 20, "%llu", audit_sig_cid->id);
> +               }
> +               sig_data2 = kmalloc(sizeof(*sig_data2) + contidstrlen + len, GFP_KERNEL);
> +               if (!sig_data2) {
> +                       if (audit_sig_sid)
> +                               security_release_secctx(ctx, len);
> +                       kfree(contidstr);
> +                       return -ENOMEM;
> +               }
> +               sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
> +               sig_data2->pid = audit_sig_pid;
> +               if (audit_sig_cid) {
> +                       memcpy(sig_data2->data, contidstr, contidstrlen);
> +                       sig_data2->cid_len = contidstrlen;
> +                       kfree(contidstr);
> +               }
> +               if (audit_sig_sid) {
> +                       memcpy(sig_data2->data + contidstrlen, ctx, len);
> +                       security_release_secctx(ctx, len);
> +               }
> +               rcu_read_lock();
> +               _audit_contobj_put_sig(audit_sig_cid);
> +               rcu_read_unlock();

We probably want to drop the reference in the legacy/AUDIT_SIGNAL_INFO
case too, right?

> +               audit_sig_cid = NULL;
> +               audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
> +                                sig_data2, sizeof(*sig_data2) + contidstrlen + len);
> +               kfree(sig_data2);
> +               break;
> +       }
>         case AUDIT_TTY_GET: {
>                 struct audit_tty_status s;
>                 unsigned int t;
> @@ -2470,6 +2553,11 @@ int audit_signal_info(int sig, struct task_struct *t)
>                 else
>                         audit_sig_uid = uid;
>                 security_task_getsecid(current, &audit_sig_sid);
> +               rcu_read_lock();
> +               _audit_contobj_put_sig(audit_sig_cid);
> +               audit_sig_cid = _audit_contobj_get_sig(current);
> +               rcu_read_unlock();
> +               audit_sig_atsk = t;
>         }
>
>         return audit_signal_info_syscall(t);
> @@ -2532,6 +2620,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
>                         if (cont->id == contid) {
>                                 /* task injection to existing container */
>                                 if (current == cont->owner) {
> +                                       if (!refcount_read(&cont->refcount)) {
> +                                               rc = -ESHUTDOWN;

Reuse -ENOTUNIQ; I'm not overly excited about providing a lot of
detail here as these are global system objects.  If you must have a
different errno (and I would prefer you didn't), use something like
-EBUSY.


> +                                               spin_unlock(&audit_contobj_list_lock);
> +                                               goto conterror;
> +                                       }
>                                         _audit_contobj_hold(cont);
>                                         newcont = cont;
>                                 } else {
> diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
> index b69231918686..8303bb7a63d0 100644
> --- a/security/selinux/nlmsgtab.c
> +++ b/security/selinux/nlmsgtab.c
> @@ -137,6 +137,7 @@ struct nlmsg_perm {
>         { AUDIT_DEL_RULE,       NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
>         { AUDIT_USER,           NETLINK_AUDIT_SOCKET__NLMSG_RELAY    },
>         { AUDIT_SIGNAL_INFO,    NETLINK_AUDIT_SOCKET__NLMSG_READ     },
> +       { AUDIT_SIGNAL_INFO2,   NETLINK_AUDIT_SOCKET__NLMSG_READ     },
>         { AUDIT_TRIM,           NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
>         { AUDIT_MAKE_EQUIV,     NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
>         { AUDIT_TTY_GET,        NETLINK_AUDIT_SOCKET__NLMSG_READ     },

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 07/13] audit: add support for non-syscall auxiliary records
  2020-06-27 13:20 ` [PATCH ghak90 V9 07/13] audit: add support for non-syscall auxiliary records Richard Guy Briggs
@ 2020-07-05 15:11   ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:11 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Standalone audit records have the timestamp and serial number generated
> on the fly and as such are unique, making them standalone.  This new
> function audit_alloc_local() generates a local audit context that will
> be used only for a standalone record and its auxiliary record(s).  The
> context is discarded immediately after the local associated records are
> produced.

We've had some good discussions on the list about why we can't reuse
the "in_syscall" field and need to add a "local" field, I think it
would be good to address that here in the commit description.

> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Serge Hallyn <serge@hallyn.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  include/linux/audit.h |  8 ++++++++
>  kernel/audit.h        |  1 +
>  kernel/auditsc.c      | 33 ++++++++++++++++++++++++++++-----
>  3 files changed, 37 insertions(+), 5 deletions(-)

...

> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 9e79645e5c0e..935eb3d2cde9 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -908,11 +908,13 @@ static inline void audit_free_aux(struct audit_context *context)
>         }
>  }
>
> -static inline struct audit_context *audit_alloc_context(enum audit_state state)
> +static inline struct audit_context *audit_alloc_context(enum audit_state state,
> +                                                       gfp_t gfpflags)
>  {
>         struct audit_context *context;
>
> -       context = kzalloc(sizeof(*context), GFP_KERNEL);
> +       /* We can be called in atomic context via audit_tg() */

At this point I think it's clear we need a respin so I'm not going to
preface all of my nitpick comments as such, although this definitely
would qualify ...

I don't believe audit_tg() doesn't exist yet, likely coming later in
this patchset, so please remove this comment as it doesn't make sense
in this context.

To be frank, don't re-add the comment later in the patchset either.
Comments like these tend to be fragile and don't really add any great
insight.  The audit_tg() function can, and most likely will, be
modified at some point in the future such that the comment above no
longer applies, and there is a reasonable chance that when it does the
above comment will not be updated.  Further, anyone modifying the
audit_alloc_context() is going to look at the callers (rather they
*should* look at the callers) and will notice the no-sleep
requirements.

> @@ -960,8 +963,27 @@ int audit_alloc_syscall(struct task_struct *tsk)
>         return 0;
>  }
>
> -static inline void audit_free_context(struct audit_context *context)
> +struct audit_context *audit_alloc_local(gfp_t gfpflags)
>  {
> +       struct audit_context *context = NULL;
> +
> +       context = audit_alloc_context(AUDIT_RECORD_CONTEXT, gfpflags);
> +       if (!context) {
> +               audit_log_lost("out of memory in audit_alloc_local");
> +               goto out;

You might as well just return NULL here, no need to jump and then return NULL.


> +       }
> +       context->serial = audit_serial();
> +       ktime_get_coarse_real_ts64(&context->ctime);
> +       context->local = true;
> +out:
> +       return context;
> +}
> +EXPORT_SYMBOL(audit_alloc_local);

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 08/13] audit: add containerid support for user records
  2020-06-27 13:20 ` [PATCH ghak90 V9 08/13] audit: add containerid support for user records Richard Guy Briggs
@ 2020-07-05 15:11   ` Paul Moore
  2020-07-18  0:43     ` Richard Guy Briggs
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:11 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Add audit container identifier auxiliary record to user event standalone
> records.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  kernel/audit.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 54dd2cb69402..997c34178ee8 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1507,6 +1504,14 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                                 audit_log_n_untrustedstring(ab, str, data_len);
>                         }
>                         audit_log_end(ab);
> +                       rcu_read_lock();
> +                       cont = _audit_contobj_get(current);
> +                       rcu_read_unlock();
> +                       audit_log_container_id(context, cont);
> +                       rcu_read_lock();
> +                       _audit_contobj_put(cont);
> +                       rcu_read_unlock();
> +                       audit_free_context(context);

I haven't searched the entire patchset, but it seems like the pattern
above happens a couple of times in this patchset, yes?  If so would it
make sense to wrap the above get/log/put in a helper function?

Not a big deal either way, I'm pretty neutral on it at this point in
the patchset but thought it might be worth mentioning in case you
noticed the same and were on the fence.

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 10/13] audit: add support for containerid to network namespaces
  2020-06-27 13:20 ` [PATCH ghak90 V9 10/13] audit: add support for containerid to network namespaces Richard Guy Briggs
@ 2020-07-05 15:11   ` Paul Moore
  2020-07-21 22:05     ` Richard Guy Briggs
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:11 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> This also adds support to qualify NETFILTER_PKT records.
>
> Audit events could happen in a network namespace outside of a task
> context due to packets received from the net that trigger an auditing
> rule prior to being associated with a running task.  The network
> namespace could be in use by multiple containers by association to the
> tasks in that network namespace.  We still want a way to attribute
> these events to any potential containers.  Keep a list per network
> namespace to track these audit container identifiiers.
>
> Add/increment the audit container identifier on:
> - initial setting of the audit container identifier via /proc
> - clone/fork call that inherits an audit container identifier
> - unshare call that inherits an audit container identifier
> - setns call that inherits an audit container identifier
> Delete/decrement the audit container identifier on:
> - an inherited audit container identifier dropped when child set
> - process exit
> - unshare call that drops a net namespace
> - setns call that drops a net namespace
>
> Add audit container identifier auxiliary record(s) to NETFILTER_PKT
> event standalone records.  Iterate through all potential audit container
> identifiers associated with a network namespace.
>
> Please see the github audit kernel issue for contid net support:
>   https://github.com/linux-audit/audit-kernel/issues/92
> Please see the github audit testsuiite issue for the test case:
>   https://github.com/linux-audit/audit-testsuite/issues/64
> Please see the github audit wiki for the feature overview:
>   https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> ---
>  include/linux/audit.h    |  20 ++++++
>  kernel/audit.c           | 156 ++++++++++++++++++++++++++++++++++++++++++++++-
>  kernel/nsproxy.c         |   4 ++
>  net/netfilter/nft_log.c  |  11 +++-
>  net/netfilter/xt_AUDIT.c |  11 +++-
>  5 files changed, 195 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index c4a755ae0d61..304fbb7c3c5b 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -128,6 +128,13 @@ struct audit_task_info {
>
>  extern struct audit_task_info init_struct_audit;
>
> +struct audit_contobj_netns {
> +       struct list_head        list;
> +       struct audit_contobj    *obj;
> +       int                     count;

This seems like it might be a good candidate for refcount_t, yes?

> +       struct rcu_head         rcu;
> +};

...

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 997c34178ee8..a862721dfd9b 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -437,6 +452,136 @@ static struct sock *audit_get_sk(const struct net *net)
>         return aunet->sk;
>  }
>
> +void audit_netns_contid_add(struct net *net, struct audit_contobj *cont)
> +{
> +       struct audit_net *aunet;
> +       struct list_head *contobj_list;
> +       struct audit_contobj_netns *contns;
> +
> +       if (!net)
> +               return;
> +       if (!cont)
> +               return;
> +       aunet = net_generic(net, audit_net_id);
> +       if (!aunet)
> +               return;
> +       contobj_list = &aunet->contobj_list;
> +       rcu_read_lock();
> +       spin_lock(&aunet->contobj_list_lock);
> +       list_for_each_entry_rcu(contns, contobj_list, list)
> +               if (contns->obj == cont) {
> +                       contns->count++;
> +                       goto out;
> +               }
> +       contns = kmalloc(sizeof(*contns), GFP_ATOMIC);
> +       if (contns) {
> +               INIT_LIST_HEAD(&contns->list);
> +               contns->obj = cont;
> +               contns->count = 1;
> +               list_add_rcu(&contns->list, contobj_list);
> +       }
> +out:
> +       spin_unlock(&aunet->contobj_list_lock);
> +       rcu_read_unlock();
> +}
> +
> +void audit_netns_contid_del(struct net *net, struct audit_contobj *cont)
> +{
> +       struct audit_net *aunet;
> +       struct list_head *contobj_list;
> +       struct audit_contobj_netns *contns = NULL;
> +
> +       if (!net)
> +               return;
> +       if (!cont)
> +               return;
> +       aunet = net_generic(net, audit_net_id);
> +       if (!aunet)
> +               return;
> +       contobj_list = &aunet->contobj_list;
> +       rcu_read_lock();
> +       spin_lock(&aunet->contobj_list_lock);
> +       list_for_each_entry_rcu(contns, contobj_list, list)
> +               if (contns->obj == cont) {
> +                       contns->count--;
> +                       if (contns->count < 1) {

One could simplify this with "(--countns->count) < 1", although if it
is changed to a refcount_t (which seems like a smart thing), the
normal decrement/test would be the best choice.


> +                               list_del_rcu(&contns->list);
> +                               kfree_rcu(contns, rcu);
> +                       }
> +                       break;
> +               }
> +       spin_unlock(&aunet->contobj_list_lock);
> +       rcu_read_unlock();
> +}

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 11/13] audit: contid check descendancy and nesting
  2020-06-27 13:20 ` [PATCH ghak90 V9 11/13] audit: contid check descendancy and nesting Richard Guy Briggs
@ 2020-07-05 15:11   ` Paul Moore
  2020-08-07 17:10     ` Richard Guy Briggs
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:11 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Require the target task to be a descendant of the container
> orchestrator/engine.
>
> You would only change the audit container ID from one set or inherited
> value to another if you were nesting containers.
>
> If changing the contid, the container orchestrator/engine must be a
> descendant and not same orchestrator as the one that set it so it is not
> possible to change the contid of another orchestrator's container.
>
> Since the task_is_descendant() function is used in YAMA and in audit,
> remove the duplication and pull the function into kernel/core/sched.c
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/sched.h    |  3 +++
>  kernel/audit.c           | 23 +++++++++++++++++++++--
>  kernel/sched/core.c      | 33 +++++++++++++++++++++++++++++++++
>  security/yama/yama_lsm.c | 33 ---------------------------------
>  4 files changed, 57 insertions(+), 35 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2213ac670386..06938d0b9e0c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2047,4 +2047,7 @@ static inline void rseq_syscall(struct pt_regs *regs)
>
>  const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
>
> +extern int task_is_descendant(struct task_struct *parent,
> +                             struct task_struct *child);
> +
>  #endif
> diff --git a/kernel/audit.c b/kernel/audit.c
> index a862721dfd9b..efa65ec01239 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -2713,6 +2713,20 @@ int audit_signal_info(int sig, struct task_struct *t)
>         return audit_signal_info_syscall(t);
>  }
>
> +static bool audit_contid_isnesting(struct task_struct *tsk)
> +{
> +       bool isowner = false;
> +       bool ownerisparent = false;
> +
> +       rcu_read_lock();
> +       if (tsk->audit && tsk->audit->cont) {
> +               isowner = current == tsk->audit->cont->owner;
> +               ownerisparent = task_is_descendant(tsk->audit->cont->owner, current);

I want to make sure I'm understanding this correctly and I keep
mentally tripping over something: it seems like for a given audit
container ID a task is either the owner or a descendent, there is no
third state, is that correct?

Assuming that is true, can the descendent check simply be a negative
owner check given they both have the same audit container ID?

> +       }
> +       rcu_read_unlock();
> +       return !isowner && ownerisparent;
> +}
> +
>  /*
>   * audit_set_contid - set current task's audit contid
>   * @task: target task
> @@ -2755,8 +2769,13 @@ int audit_set_contid(struct task_struct *task, u64 contid)
>                 rc = -EBUSY;
>                 goto unlock;
>         }
> -       /* if contid is already set, deny */
> -       if (audit_contid_set(task))
> +       /* if task is not descendant, block */
> +       if (task == current || !task_is_descendant(current, task)) {

I'm also still fuzzy on why we can't let a task set it's own audit
container ID, assuming it meets all the criteria established in patch
2/13.  It somewhat made sense when you were tracking inherited vs
explicitly set audit container IDs, but that doesn't appear to be the
case so far in this patchset, yes?

> +               rc = -EXDEV;

I'm fairly confident we had a discussion about not using all these
different error codes, but that may be a moot point given my next
comment.

> +               goto unlock;
> +       }
> +       /* only allow contid setting again if nesting */
> +       if (audit_contid_set(task) && !audit_contid_isnesting(task))
>                 rc = -EEXIST;

It seems like what we need in audit_set_contid() is a check to ensure
that the task being modified is only modified by the owner of the
audit container ID, yes?  If so, I would think we could do this quite
easily with the following, or similar logic, (NOTE: assumes both
current and tsk are properly setup):

  if ((current->audit->cont != tsk->audit->cont) ||
(current->audit->cont->owner != current))
    return -EACCESS;

This is somewhat independent of the above issue, but we may also want
to add to the capability check.  Patch 2 adds a
"capable(CAP_AUDIT_CONTROL)" which is good, but perhaps we also need a
"ns_capable(CAP_AUDIT_CONTROL)" to allow a given audit container ID
orchestrator/owner the ability to control which of it's descendants
can change their audit container ID, for example:

  if (!capable(CAP_AUDIT_CONTROL) ||
      !ns_capable(current->nsproxy->user_ns, CAP_AUDIT_CONTROL))
    return -EPERM;

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 12/13] audit: track container nesting
  2020-06-27 13:20 ` [PATCH ghak90 V9 12/13] audit: track container nesting Richard Guy Briggs
@ 2020-07-05 15:11   ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:11 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Track the parent container of a container to be able to filter and
> report nesting.
>
> Now that we have a way to track and check the parent container of a
> container, modify the contid field format to be able to report that
> nesting using a carrat ("^") modifier to indicate nesting.  The
> original field format was "contid=<contid>" for task-associated records
> and "contid=<contid>[,<contid>[...]]" for network-namespace-associated
> records.  The new field format is
> "contid=<contid>[,^<contid>[...]][,<contid>[...]]".

I feel like this is a case which could really benefit from an example
in the commit description showing multiple levels of nesting, with
some leaf audit container IDs at each level.  This way we have a
canonical example for people who want to understand how to parse the
list and properly sort out the inheritance.


> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h |  1 +
>  kernel/audit.c        | 60 ++++++++++++++++++++++++++++++++++++++++++---------
>  kernel/audit.h        |  2 ++
>  kernel/auditfilter.c  | 17 ++++++++++++++-
>  kernel/auditsc.c      |  2 +-
>  5 files changed, 70 insertions(+), 12 deletions(-)

--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 13/13] audit: add capcontid to set contid outside init_user_ns
  2020-06-27 13:20 ` [PATCH ghak90 V9 13/13] audit: add capcontid to set contid outside init_user_ns Richard Guy Briggs
@ 2020-07-05 15:11   ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-07-05 15:11 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Sat, Jun 27, 2020 at 9:24 AM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Provide a mechanism similar to CAP_AUDIT_CONTROL to explicitly give a
> process in a non-init user namespace the capability to set audit
> container identifiers of individual children.
>
> Provide the /proc/$PID/audit_capcontid interface to capcontid.
> Valid values are: 1==enabled, 0==disabled
>
> Writing a "1" to this special file for the target process $PID will
> enable the target process to set audit container identifiers of its
> descendants.
>
> A process must already have CAP_AUDIT_CONTROL in the initial user
> namespace or have had audit_capcontid enabled by a previous use of this
> feature by its parent on this process in order to be able to enable it
> for another process.  The target process must be a descendant of the
> calling process.
>
> Report this action in new message type AUDIT_SET_CAPCONTID 1022 with
> fields opid= capcontid= old-capcontid=
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  fs/proc/base.c             | 57 +++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/audit.h      | 14 ++++++++++++
>  include/uapi/linux/audit.h |  1 +
>  kernel/audit.c             | 38 ++++++++++++++++++++++++++++++-
>  4 files changed, 108 insertions(+), 2 deletions(-)

This seems very similar to the capable/ns_capable combination I
mentioned in patch 11/13; any reasons why you feel that this might be
a better approach?  My current thinking is that the capable/ns_capable
approach is preferable as it leverages existing kernel mechanisms and
doesn't require us to reinvent the wheel in the audit subsystem.


--
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 01/13] audit: collect audit task parameters
  2020-07-05 15:09   ` Paul Moore
@ 2020-07-07  2:50     ` Richard Guy Briggs
  2020-07-08  1:42       ` Paul Moore
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-07-07  2:50 UTC (permalink / raw)
  To: Paul Moore
  Cc: nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo,
	netdev, linux-fsdevel, Eric Paris, mpatel, Serge Hallyn

On 2020-07-05 11:09, Paul Moore wrote:
> On Sat, Jun 27, 2020 at 9:21 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > The audit-related parameters in struct task_struct should ideally be
> > collected together and accessed through a standard audit API.
> >
> > Collect the existing loginuid, sessionid and audit_context together in a
> > new struct audit_task_info called "audit" in struct task_struct.
> >
> > Use kmem_cache to manage this pool of memory.
> > Un-inline audit_free() to be able to always recover that memory.
> >
> > Please see the upstream github issue
> > https://github.com/linux-audit/audit-kernel/issues/81
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >  include/linux/audit.h | 49 +++++++++++++++++++++++------------
> >  include/linux/sched.h |  7 +----
> >  init/init_task.c      |  3 +--
> >  init/main.c           |  2 ++
> >  kernel/audit.c        | 71 +++++++++++++++++++++++++++++++++++++++++++++++++--
> >  kernel/audit.h        |  5 ++++
> >  kernel/auditsc.c      | 26 ++++++++++---------
> >  kernel/fork.c         |  1 -
> >  8 files changed, 124 insertions(+), 40 deletions(-)
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 3fcd9ee49734..c2150415f9df 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -100,6 +100,16 @@ enum audit_nfcfgop {
> >         AUDIT_XT_OP_UNREGISTER,
> >  };
> >
> > +struct audit_task_info {
> > +       kuid_t                  loginuid;
> > +       unsigned int            sessionid;
> > +#ifdef CONFIG_AUDITSYSCALL
> > +       struct audit_context    *ctx;
> > +#endif
> > +};
> > +
> > +extern struct audit_task_info init_struct_audit;
> > +
> >  extern int is_audit_feature_set(int which);
> >
> >  extern int __init audit_register_class(int class, unsigned *list);
> 
> ...
> 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index b62e6aaf28f0..2213ac670386 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -34,7 +34,6 @@
> >  #include <linux/kcsan.h>
> >
> >  /* task_struct member predeclarations (sorted alphabetically): */
> > -struct audit_context;
> >  struct backing_dev_info;
> >  struct bio_list;
> >  struct blk_plug;
> > @@ -946,11 +945,7 @@ struct task_struct {
> >         struct callback_head            *task_works;
> >
> >  #ifdef CONFIG_AUDIT
> > -#ifdef CONFIG_AUDITSYSCALL
> > -       struct audit_context            *audit_context;
> > -#endif
> > -       kuid_t                          loginuid;
> > -       unsigned int                    sessionid;
> > +       struct audit_task_info          *audit;
> >  #endif
> >         struct seccomp                  seccomp;
> 
> In the early days of this patchset we talked a lot about how to handle
> the task_struct and the changes that would be necessary, ultimately
> deciding that encapsulating all of the audit fields into an
> audit_task_info struct.  However, what is puzzling me a bit at this
> moment is why we are only including audit_task_info in task_info by
> reference *and* making it a build time conditional (via CONFIG_AUDIT).
> 
> If audit is enabled at build time it would seem that we are always
> going to allocate an audit_task_info struct, so I have to wonder why
> we don't simply embed it inside the task_info struct (similar to the
> seccomp struct in the snippet above?  Of course the audit_context
> struct needs to remain as is, I'm talking only about the
> task_info/audit_task_info struct.

I agree that including the audit_task_info struct in the struct
task_struct would have been preferred to simplify allocation and free,
but the reason it was included by reference instead was to make the
task_struct size independent of audit so that future changes would not
cause as many kABI challenges.  This first change will cause kABI
challenges regardless, but it was future ones that we were trying to
ease.

Does that match with your recollection?

> Richard, I'm sure you can answer this off the top of your head, but
> I'd have to go digging through the archives to pull out the relevant
> discussions so I figured I would just ask you for a reminder ... ?  I
> imagine it's also possible things have changed a bit since those early
> discussions and the solution we arrived at then no longer makes as
> much sense as it did before.

Agreed, it doesn't make as much sense now as it did when proposed, but
will make more sense in the future depending on when this change gets
accepted upstream.  This is why I wanted this patch to go through as
part of ghak81 at the time the rest of it did so that future kABI issues
would be easier to handle, but that ship has long sailed.  I didn't make
that argument then and I regret it now that I realize and recall some of
the thinking behind the change.  Your reasons at the time were that
contid was the only user of that change but there have been some
CONFIG_AUDIT and CONFIG_AUDITSYSCALL changes since that were related.

> > diff --git a/init/init_task.c b/init/init_task.c
> > index 15089d15010a..92d34c4b7702 100644
> > --- a/init/init_task.c
> > +++ b/init/init_task.c
> > @@ -130,8 +130,7 @@ struct task_struct init_task
> >         .thread_group   = LIST_HEAD_INIT(init_task.thread_group),
> >         .thread_node    = LIST_HEAD_INIT(init_signals.thread_head),
> >  #ifdef CONFIG_AUDIT
> > -       .loginuid       = INVALID_UID,
> > -       .sessionid      = AUDIT_SID_UNSET,
> > +       .audit          = &init_struct_audit,
> >  #endif
> >  #ifdef CONFIG_PERF_EVENTS
> >         .perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
> > diff --git a/init/main.c b/init/main.c
> > index 0ead83e86b5a..349470ad7458 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -96,6 +96,7 @@
> >  #include <linux/jump_label.h>
> >  #include <linux/mem_encrypt.h>
> >  #include <linux/kcsan.h>
> > +#include <linux/audit.h>
> >
> >  #include <asm/io.h>
> >  #include <asm/bugs.h>
> > @@ -1028,6 +1029,7 @@ asmlinkage __visible void __init start_kernel(void)
> >         nsfs_init();
> >         cpuset_init();
> >         cgroup_init();
> > +       audit_task_init();
> >         taskstats_init_early();
> >         delayacct_init();
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 8c201f414226..5d8147a29291 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -203,6 +203,73 @@ struct audit_reply {
> >         struct sk_buff *skb;
> >  };
> >
> > +static struct kmem_cache *audit_task_cache;
> > +
> > +void __init audit_task_init(void)
> > +{
> > +       audit_task_cache = kmem_cache_create("audit_task",
> > +                                            sizeof(struct audit_task_info),
> > +                                            0, SLAB_PANIC, NULL);
> > +}
> > +
> > +/**
> > + * audit_alloc - allocate an audit info block for a task
> > + * @tsk: task
> > + *
> > + * Call audit_alloc_syscall to filter on the task information and
> > + * allocate a per-task audit context if necessary.  This is called from
> > + * copy_process, so no lock is needed.
> > + */
> > +int audit_alloc(struct task_struct *tsk)
> > +{
> > +       int ret = 0;
> > +       struct audit_task_info *info;
> > +
> > +       info = kmem_cache_alloc(audit_task_cache, GFP_KERNEL);
> > +       if (!info) {
> > +               ret = -ENOMEM;
> > +               goto out;
> > +       }
> > +       info->loginuid = audit_get_loginuid(current);
> > +       info->sessionid = audit_get_sessionid(current);
> > +       tsk->audit = info;
> > +
> > +       ret = audit_alloc_syscall(tsk);
> > +       if (ret) {
> > +               tsk->audit = NULL;
> > +               kmem_cache_free(audit_task_cache, info);
> > +       }
> > +out:
> > +       return ret;
> > +}
> 
> This is a big nitpick, and I'm only mentioning this in the case you
> need to respin this patchset: the "out" label is unnecessary in the
> function above.  Simply return the error code, there is no need to
> jump to "out" only to immediately return the error code there and
> nothing more.

Agreed.  This must have been due to some restructuring that no longer
needed an exit cleanup action.

> > +struct audit_task_info init_struct_audit = {
> > +       .loginuid = INVALID_UID,
> > +       .sessionid = AUDIT_SID_UNSET,
> > +#ifdef CONFIG_AUDITSYSCALL
> > +       .ctx = NULL,
> > +#endif
> > +};
> > +
> > +/**
> > + * audit_free - free per-task audit info
> > + * @tsk: task whose audit info block to free
> > + *
> > + * Called from copy_process and do_exit
> > + */
> > +void audit_free(struct task_struct *tsk)
> > +{
> > +       struct audit_task_info *info = tsk->audit;
> > +
> > +       audit_free_syscall(tsk);
> > +       /* Freeing the audit_task_info struct must be performed after
> > +        * audit_log_exit() due to need for loginuid and sessionid.
> > +        */
> > +       info = tsk->audit;
> > +       tsk->audit = NULL;
> > +       kmem_cache_free(audit_task_cache, info);
> 
> Another nitpick, and this one may even become a moot point given the
> question posed above.  However, is there any reason we couldn't get
> rid of "info" and simplify this a bit?

That info allocation and assignment does now seem pointless, I agree...

>   audit_free_syscall(tsk);
>   kmem_cache_free(audit_task_cache, tsk->audit);
>   tsk->audit = NULL;
> 
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index 468a23390457..f00c1da587ea 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -1612,7 +1615,6 @@ void __audit_free(struct task_struct *tsk)
> >                 if (context->current_state == AUDIT_RECORD_CONTEXT)
> >                         audit_log_exit();
> >         }
> > -
> >         audit_set_context(tsk, NULL);
> >         audit_free_context(context);
> >  }
> 
> This nitpick is barely worth the time it is taking me to write this,
> but the whitespace change above isn't strictly necessary.

Sure, it is a harmless but noisy cleanup when the function was being
cleaned up and renamed.  It wasn't an accident, but a style preference.
Do you prefer a vertical space before cleanup actions at the end of
functions and more versus less vertical whitespace in general?

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 01/13] audit: collect audit task parameters
  2020-07-07  2:50     ` Richard Guy Briggs
@ 2020-07-08  1:42       ` Paul Moore
  2020-07-13 20:29         ` Richard Guy Briggs
  0 siblings, 1 reply; 42+ messages in thread
From: Paul Moore @ 2020-07-08  1:42 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo,
	netdev, linux-fsdevel, Eric Paris, mpatel, Serge Hallyn

On Mon, Jul 6, 2020 at 10:50 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-07-05 11:09, Paul Moore wrote:
> > On Sat, Jun 27, 2020 at 9:21 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > >
> > > The audit-related parameters in struct task_struct should ideally be
> > > collected together and accessed through a standard audit API.
> > >
> > > Collect the existing loginuid, sessionid and audit_context together in a
> > > new struct audit_task_info called "audit" in struct task_struct.
> > >
> > > Use kmem_cache to manage this pool of memory.
> > > Un-inline audit_free() to be able to always recover that memory.
> > >
> > > Please see the upstream github issue
> > > https://github.com/linux-audit/audit-kernel/issues/81
> > >
> > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > > Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > ---
> > >  include/linux/audit.h | 49 +++++++++++++++++++++++------------
> > >  include/linux/sched.h |  7 +----
> > >  init/init_task.c      |  3 +--
> > >  init/main.c           |  2 ++
> > >  kernel/audit.c        | 71 +++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  kernel/audit.h        |  5 ++++
> > >  kernel/auditsc.c      | 26 ++++++++++---------
> > >  kernel/fork.c         |  1 -
> > >  8 files changed, 124 insertions(+), 40 deletions(-)
> > >
> > > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > > index 3fcd9ee49734..c2150415f9df 100644
> > > --- a/include/linux/audit.h
> > > +++ b/include/linux/audit.h
> > > @@ -100,6 +100,16 @@ enum audit_nfcfgop {
> > >         AUDIT_XT_OP_UNREGISTER,
> > >  };
> > >
> > > +struct audit_task_info {
> > > +       kuid_t                  loginuid;
> > > +       unsigned int            sessionid;
> > > +#ifdef CONFIG_AUDITSYSCALL
> > > +       struct audit_context    *ctx;
> > > +#endif
> > > +};
> > > +
> > > +extern struct audit_task_info init_struct_audit;
> > > +
> > >  extern int is_audit_feature_set(int which);
> > >
> > >  extern int __init audit_register_class(int class, unsigned *list);
> >
> > ...
> >
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index b62e6aaf28f0..2213ac670386 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -34,7 +34,6 @@
> > >  #include <linux/kcsan.h>
> > >
> > >  /* task_struct member predeclarations (sorted alphabetically): */
> > > -struct audit_context;
> > >  struct backing_dev_info;
> > >  struct bio_list;
> > >  struct blk_plug;
> > > @@ -946,11 +945,7 @@ struct task_struct {
> > >         struct callback_head            *task_works;
> > >
> > >  #ifdef CONFIG_AUDIT
> > > -#ifdef CONFIG_AUDITSYSCALL
> > > -       struct audit_context            *audit_context;
> > > -#endif
> > > -       kuid_t                          loginuid;
> > > -       unsigned int                    sessionid;
> > > +       struct audit_task_info          *audit;
> > >  #endif
> > >         struct seccomp                  seccomp;
> >
> > In the early days of this patchset we talked a lot about how to handle
> > the task_struct and the changes that would be necessary, ultimately
> > deciding that encapsulating all of the audit fields into an
> > audit_task_info struct.  However, what is puzzling me a bit at this
> > moment is why we are only including audit_task_info in task_info by
> > reference *and* making it a build time conditional (via CONFIG_AUDIT).
> >
> > If audit is enabled at build time it would seem that we are always
> > going to allocate an audit_task_info struct, so I have to wonder why
> > we don't simply embed it inside the task_info struct (similar to the
> > seccomp struct in the snippet above?  Of course the audit_context
> > struct needs to remain as is, I'm talking only about the
> > task_info/audit_task_info struct.
>
> I agree that including the audit_task_info struct in the struct
> task_struct would have been preferred to simplify allocation and free,
> but the reason it was included by reference instead was to make the
> task_struct size independent of audit so that future changes would not
> cause as many kABI challenges.  This first change will cause kABI
> challenges regardless, but it was future ones that we were trying to
> ease.
>
> Does that match with your recollection?

I guess, sure.  I suppose what I was really asking was if we had a
"good" reason for not embedding the audit_task_info struct.
Regardless, thanks for the explanation, that was helpful.

From an upstream perspective, I think embedding the audit_task_info
struct is the Right Thing To Do.  The code is cleaner and more robust
if we embed the struct.

> > Richard, I'm sure you can answer this off the top of your head, but
> > I'd have to go digging through the archives to pull out the relevant
> > discussions so I figured I would just ask you for a reminder ... ?  I
> > imagine it's also possible things have changed a bit since those early
> > discussions and the solution we arrived at then no longer makes as
> > much sense as it did before.
>
> Agreed, it doesn't make as much sense now as it did when proposed, but
> will make more sense in the future depending on when this change gets
> accepted upstream.  This is why I wanted this patch to go through as
> part of ghak81 at the time the rest of it did so that future kABI issues
> would be easier to handle, but that ship has long sailed.

To be clear, kABI issues with task_struct really aren't an issue with
the upstream kernel.  I know that you know all of this already
Richard, I'm mostly talking to everyone else on the To/CC line in case
they are casually watching this discussion.

While I'm sympathetic to long-lifetime enterprise distros such as
RHEL, my responsibility is to ensure the upstream kernel is as good as
we can make it, and in this case I believe that means embedding
audit_task_info into the task_struct.

> I didn't make
> that argument then and I regret it now that I realize and recall some of
> the thinking behind the change.  Your reasons at the time were that
> contid was the only user of that change but there have been some
> CONFIG_AUDIT and CONFIG_AUDITSYSCALL changes since that were related.

Agreed that there are probably some common goals and benefits with
those changes and the audit container ID work, however, I believe that
discussion quickly goes back to upstream vs RHEL.

> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index 468a23390457..f00c1da587ea 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -1612,7 +1615,6 @@ void __audit_free(struct task_struct *tsk)
> > >                 if (context->current_state == AUDIT_RECORD_CONTEXT)
> > >                         audit_log_exit();
> > >         }
> > > -
> > >         audit_set_context(tsk, NULL);
> > >         audit_free_context(context);
> > >  }
> >
> > This nitpick is barely worth the time it is taking me to write this,
> > but the whitespace change above isn't strictly necessary.
>
> Sure, it is a harmless but noisy cleanup when the function was being
> cleaned up and renamed.  It wasn't an accident, but a style preference.
> Do you prefer a vertical space before cleanup actions at the end of
> functions and more versus less vertical whitespace in general?

As I mentioned above, this really was barely worth mentioning, but I
made the comment simply because I feel this patchset is going to draw
a lot of attention once it is merged and I feel keeping the patchset
as small, and as focused, as possible is a good thing.

However, I'm not going to lose even a second of sleep over a single
blank line gone missing ;)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 01/13] audit: collect audit task parameters
  2020-07-08  1:42       ` Paul Moore
@ 2020-07-13 20:29         ` Richard Guy Briggs
  2020-07-14  0:44           ` Paul Moore
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-07-13 20:29 UTC (permalink / raw)
  To: Paul Moore
  Cc: nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo,
	netdev, linux-fsdevel, Eric Paris, mpatel, Serge Hallyn

On 2020-07-07 21:42, Paul Moore wrote:
> On Mon, Jul 6, 2020 at 10:50 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2020-07-05 11:09, Paul Moore wrote:
> > > On Sat, Jun 27, 2020 at 9:21 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > >
> > > > The audit-related parameters in struct task_struct should ideally be
> > > > collected together and accessed through a standard audit API.
> > > >
> > > > Collect the existing loginuid, sessionid and audit_context together in a
> > > > new struct audit_task_info called "audit" in struct task_struct.
> > > >
> > > > Use kmem_cache to manage this pool of memory.
> > > > Un-inline audit_free() to be able to always recover that memory.
> > > >
> > > > Please see the upstream github issue
> > > > https://github.com/linux-audit/audit-kernel/issues/81
> > > >
> > > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > > > Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > > ---
> > > >  include/linux/audit.h | 49 +++++++++++++++++++++++------------
> > > >  include/linux/sched.h |  7 +----
> > > >  init/init_task.c      |  3 +--
> > > >  init/main.c           |  2 ++
> > > >  kernel/audit.c        | 71 +++++++++++++++++++++++++++++++++++++++++++++++++--
> > > >  kernel/audit.h        |  5 ++++
> > > >  kernel/auditsc.c      | 26 ++++++++++---------
> > > >  kernel/fork.c         |  1 -
> > > >  8 files changed, 124 insertions(+), 40 deletions(-)
> > > >
> > > > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > > > index 3fcd9ee49734..c2150415f9df 100644
> > > > --- a/include/linux/audit.h
> > > > +++ b/include/linux/audit.h
> > > > @@ -100,6 +100,16 @@ enum audit_nfcfgop {
> > > >         AUDIT_XT_OP_UNREGISTER,
> > > >  };
> > > >
> > > > +struct audit_task_info {
> > > > +       kuid_t                  loginuid;
> > > > +       unsigned int            sessionid;
> > > > +#ifdef CONFIG_AUDITSYSCALL
> > > > +       struct audit_context    *ctx;
> > > > +#endif
> > > > +};
> > > > +
> > > > +extern struct audit_task_info init_struct_audit;
> > > > +
> > > >  extern int is_audit_feature_set(int which);
> > > >
> > > >  extern int __init audit_register_class(int class, unsigned *list);
> > >
> > > ...
> > >
> > > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > > index b62e6aaf28f0..2213ac670386 100644
> > > > --- a/include/linux/sched.h
> > > > +++ b/include/linux/sched.h
> > > > @@ -34,7 +34,6 @@
> > > >  #include <linux/kcsan.h>
> > > >
> > > >  /* task_struct member predeclarations (sorted alphabetically): */
> > > > -struct audit_context;
> > > >  struct backing_dev_info;
> > > >  struct bio_list;
> > > >  struct blk_plug;
> > > > @@ -946,11 +945,7 @@ struct task_struct {
> > > >         struct callback_head            *task_works;
> > > >
> > > >  #ifdef CONFIG_AUDIT
> > > > -#ifdef CONFIG_AUDITSYSCALL
> > > > -       struct audit_context            *audit_context;
> > > > -#endif
> > > > -       kuid_t                          loginuid;
> > > > -       unsigned int                    sessionid;
> > > > +       struct audit_task_info          *audit;
> > > >  #endif
> > > >         struct seccomp                  seccomp;
> > >
> > > In the early days of this patchset we talked a lot about how to handle
> > > the task_struct and the changes that would be necessary, ultimately
> > > deciding that encapsulating all of the audit fields into an
> > > audit_task_info struct.  However, what is puzzling me a bit at this
> > > moment is why we are only including audit_task_info in task_info by
> > > reference *and* making it a build time conditional (via CONFIG_AUDIT).
> > >
> > > If audit is enabled at build time it would seem that we are always
> > > going to allocate an audit_task_info struct, so I have to wonder why
> > > we don't simply embed it inside the task_info struct (similar to the
> > > seccomp struct in the snippet above?  Of course the audit_context
> > > struct needs to remain as is, I'm talking only about the
> > > task_info/audit_task_info struct.
> >
> > I agree that including the audit_task_info struct in the struct
> > task_struct would have been preferred to simplify allocation and free,
> > but the reason it was included by reference instead was to make the
> > task_struct size independent of audit so that future changes would not
> > cause as many kABI challenges.  This first change will cause kABI
> > challenges regardless, but it was future ones that we were trying to
> > ease.
> >
> > Does that match with your recollection?
> 
> I guess, sure.  I suppose what I was really asking was if we had a
> "good" reason for not embedding the audit_task_info struct.
> Regardless, thanks for the explanation, that was helpful.

Making it dynamic was actually your idea back in the spring of 2018:
	https://lkml.org/lkml/2018/4/18/759

The first two iterations were embedded to more quickly prove the idea:
	https://lkml.org/lkml/2018/5/12/173
		https://lkml.org/lkml/2018/5/12/168

And then switched as strongly recommended to a dynamic pointer:
	https://lkml.org/lkml/2018/5/16/461
		https://lkml.org/lkml/2018/5/16/457

I was initially concerned about switching to a dynamically allocated
structure, but those concerns are a couple of years behind us.

What significant change has happenned since then to alter your
perspective?

> From an upstream perspective, I think embedding the audit_task_info
> struct is the Right Thing To Do.  The code is cleaner and more robust
> if we embed the struct.

I would agree if the audit subsystem were done.  It isn't.

> > > Richard, I'm sure you can answer this off the top of your head, but
> > > I'd have to go digging through the archives to pull out the relevant
> > > discussions so I figured I would just ask you for a reminder ... ?  I
> > > imagine it's also possible things have changed a bit since those early
> > > discussions and the solution we arrived at then no longer makes as
> > > much sense as it did before.
> >
> > Agreed, it doesn't make as much sense now as it did when proposed, but
> > will make more sense in the future depending on when this change gets
> > accepted upstream.  This is why I wanted this patch to go through as
> > part of ghak81 at the time the rest of it did so that future kABI issues
> > would be easier to handle, but that ship has long sailed.
> 
> To be clear, kABI issues with task_struct really aren't an issue with
> the upstream kernel.  I know that you know all of this already
> Richard, I'm mostly talking to everyone else on the To/CC line in case
> they are casually watching this discussion.

kABI issues may not as much of an upstream issue, but part of the goal
here was upstream kernel issues, isolating the kernel audit changes
to its own subsystem and affect struct task_struct as little as possible
in the future and to protect it from "abuse" (as you had expressed
serious concerns) from the rest of the kernel.  include/linux/sched.h
will need to know more about struct audit_task_info if it is embedded,
making it more suceptible to abuse.

> While I'm sympathetic to long-lifetime enterprise distros such as
> RHEL, my responsibility is to ensure the upstream kernel is as good as
> we can make it, and in this case I believe that means embedding
> audit_task_info into the task_struct.

Keeping audit_task_info dynamic will also make embedding struct
audit_context as a zero-length array at the end of it possible in the
future as an internal audit subsystem optimization whereas largely
preclude that if it were embedded.  Any change to audit_task_info in the
future will change struct task_struct which is what we had agreed was a
good thing to avoid to keep audit as isolated and independent as
possible.

This method has been well exercised over the last two years of
development, testing and rebases, so I'm not particularly concerned
about its dynamic nature any more.  It works well.  At this point this
change seems to be more gratuitously disruptive than helpful.

> > I didn't make
> > that argument then and I regret it now that I realize and recall some of
> > the thinking behind the change.  Your reasons at the time were that
> > contid was the only user of that change but there have been some
> > CONFIG_AUDIT and CONFIG_AUDITSYSCALL changes since that were related.
> 
> Agreed that there are probably some common goals and benefits with
> those changes and the audit container ID work, however, I believe that
> discussion quickly goes back to upstream vs RHEL.

I did't think things were quite so cut and dried with respect to upstream
vs downstream.

> > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > index 468a23390457..f00c1da587ea 100644
> > > > --- a/kernel/auditsc.c
> > > > +++ b/kernel/auditsc.c
> > > > @@ -1612,7 +1615,6 @@ void __audit_free(struct task_struct *tsk)
> > > >                 if (context->current_state == AUDIT_RECORD_CONTEXT)
> > > >                         audit_log_exit();
> > > >         }
> > > > -
> > > >         audit_set_context(tsk, NULL);
> > > >         audit_free_context(context);
> > > >  }
> > >
> > > This nitpick is barely worth the time it is taking me to write this,
> > > but the whitespace change above isn't strictly necessary.
> >
> > Sure, it is a harmless but noisy cleanup when the function was being
> > cleaned up and renamed.  It wasn't an accident, but a style preference.
> > Do you prefer a vertical space before cleanup actions at the end of
> > functions and more versus less vertical whitespace in general?
> 
> As I mentioned above, this really was barely worth mentioning, but I
> made the comment simply because I feel this patchset is going to draw
> a lot of attention once it is merged and I feel keeping the patchset
> as small, and as focused, as possible is a good thing.

Is this concern also affecting the perspective on the change from
pointer to embedded above?

> However, I'm not going to lose even a second of sleep over a single
> blank line gone missing ;)
> 
> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 01/13] audit: collect audit task parameters
  2020-07-13 20:29         ` Richard Guy Briggs
@ 2020-07-14  0:44           ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-07-14  0:44 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo,
	netdev, linux-fsdevel, Eric Paris, mpatel, Serge Hallyn

On Mon, Jul 13, 2020 at 4:30 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-07-07 21:42, Paul Moore wrote:
> > On Mon, Jul 6, 2020 at 10:50 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2020-07-05 11:09, Paul Moore wrote:
> > > > On Sat, Jun 27, 2020 at 9:21 AM Richard Guy Briggs <rgb@redhat.com> wrote:

...

> > > > In the early days of this patchset we talked a lot about how to handle
> > > > the task_struct and the changes that would be necessary, ultimately
> > > > deciding that encapsulating all of the audit fields into an
> > > > audit_task_info struct.  However, what is puzzling me a bit at this
> > > > moment is why we are only including audit_task_info in task_info by
> > > > reference *and* making it a build time conditional (via CONFIG_AUDIT).
> > > >
> > > > If audit is enabled at build time it would seem that we are always
> > > > going to allocate an audit_task_info struct, so I have to wonder why
> > > > we don't simply embed it inside the task_info struct (similar to the
> > > > seccomp struct in the snippet above?  Of course the audit_context
> > > > struct needs to remain as is, I'm talking only about the
> > > > task_info/audit_task_info struct.
> > >
> > > I agree that including the audit_task_info struct in the struct
> > > task_struct would have been preferred to simplify allocation and free,
> > > but the reason it was included by reference instead was to make the
> > > task_struct size independent of audit so that future changes would not
> > > cause as many kABI challenges.  This first change will cause kABI
> > > challenges regardless, but it was future ones that we were trying to
> > > ease.
> > >
> > > Does that match with your recollection?
> >
> > I guess, sure.  I suppose what I was really asking was if we had a
> > "good" reason for not embedding the audit_task_info struct.
> > Regardless, thanks for the explanation, that was helpful.
>
> Making it dynamic was actually your idea back in the spring of 2018:
>         https://lkml.org/lkml/2018/4/18/759

If you read my comments from 2018 carefully, or even not so carefully
I think, you'll notice that my primary motivation for using a pointer
was to "hide" the audit_task_info struct contents so that they
couldn't be abused by other kernel subsystems looking for a general
container identifier inside the kernel.  As we've discussed many times
before, this patchset is not a general purpose container identifier,
this is an ***audit*** container ID; limiting the scope and usage of
this identifier is what has allowed us to gain the begrudging
acceptance we've had thus far and I believe it is the key to success.

For whatever it is worth, this patchset doesn't hide the
audit_task_struct definition in a kernel/audit*.c file, it lives in a
header file which is easily accessed by other subsystems.

In my opinion we should pick one of two options: leave it as a pointer
reference and "hide" the struct definition, or just embed the struct
and simplify the code.  I see little value in openly defining the
audit_task_info struct and using a pointer reference; if you believe
you have a valid argument for why this makes sense I'm open to hearing
it, but your comments thus far have been unconvincing.

> > > > Richard, I'm sure you can answer this off the top of your head, but
> > > > I'd have to go digging through the archives to pull out the relevant
> > > > discussions so I figured I would just ask you for a reminder ... ?  I
> > > > imagine it's also possible things have changed a bit since those early
> > > > discussions and the solution we arrived at then no longer makes as
> > > > much sense as it did before.
> > >
> > > Agreed, it doesn't make as much sense now as it did when proposed, but
> > > will make more sense in the future depending on when this change gets
> > > accepted upstream.  This is why I wanted this patch to go through as
> > > part of ghak81 at the time the rest of it did so that future kABI issues
> > > would be easier to handle, but that ship has long sailed.
> >
> > To be clear, kABI issues with task_struct really aren't an issue with
> > the upstream kernel.  I know that you know all of this already
> > Richard, I'm mostly talking to everyone else on the To/CC line in case
> > they are casually watching this discussion.
>
> kABI issues may not as much of an upstream issue, but part of the goal
> here was upstream kernel issues, isolating the kernel audit changes
> to its own subsystem and affect struct task_struct as little as possible
> in the future and to protect it from "abuse" (as you had expressed
> serious concerns) from the rest of the kernel.  include/linux/sched.h
> will need to know more about struct audit_task_info if it is embedded,
> making it more suceptible to abuse.

I define "abuse" in this context as other kernel subsystems inspecting
the contents of the audit_task_struct, most likely to try and
approximate a general container identifier.

Better separation between the audit subsystem and the task_struct,
while conceptually nice, isn't critical and is easily changed upstream
with each kernel release as it isn't part of the kernel/userspace API.
Regardless, a basic conceptual separation is achieved by the
audit_task_struct regardless of if it is embedded into the task_struct
or included by a pointer reference.

> > While I'm sympathetic to long-lifetime enterprise distros such as
> > RHEL, my responsibility is to ensure the upstream kernel is as good as
> > we can make it, and in this case I believe that means embedding
> > audit_task_info into the task_struct.
>
> Keeping audit_task_info dynamic will also make embedding struct
> audit_context as a zero-length array at the end of it possible in the
> future as an internal audit subsystem optimization whereas largely
> preclude that if it were embedded.

Predicting the future is hard, but I would be comfortable giving up on
a variable length audit_task_info struct.  Besides, if we *really* had
to do that in the future we could, it's not part of the
kernel/userspace API.

> This method has been well exercised over the last two years of
> development, testing and rebases, so I'm not particularly concerned
> about its dynamic nature any more.  It works well.  At this point this
> change seems to be more gratuitously disruptive than helpful.

It may not seem like it, but at this point in this patchset's life I
do try to limit my comments to only those things which I feel are
substantive.  In the cases where I think something is borderline I'll
mention that in my comments.  The trivial cases I'll generally call
out as "nitpicks".  I assure you my comments are not gratuitous.

I look forward to reviewing another round of this patchset about as
much as I expect you look forward to writing, testing, and submitting
it.

> > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > index 468a23390457..f00c1da587ea 100644
> > > > > --- a/kernel/auditsc.c
> > > > > +++ b/kernel/auditsc.c
> > > > > @@ -1612,7 +1615,6 @@ void __audit_free(struct task_struct *tsk)
> > > > >                 if (context->current_state == AUDIT_RECORD_CONTEXT)
> > > > >                         audit_log_exit();
> > > > >         }
> > > > > -
> > > > >         audit_set_context(tsk, NULL);
> > > > >         audit_free_context(context);
> > > > >  }
> > > >
> > > > This nitpick is barely worth the time it is taking me to write this,
> > > > but the whitespace change above isn't strictly necessary.
> > >
> > > Sure, it is a harmless but noisy cleanup when the function was being
> > > cleaned up and renamed.  It wasn't an accident, but a style preference.
> > > Do you prefer a vertical space before cleanup actions at the end of
> > > functions and more versus less vertical whitespace in general?
> >
> > As I mentioned above, this really was barely worth mentioning, but I
> > made the comment simply because I feel this patchset is going to draw
> > a lot of attention once it is merged and I feel keeping the patchset
> > as small, and as focused, as possible is a good thing.
>
> Is this concern also affecting the perspective on the change from
> pointer to embedded above?

Keeping this particular patchset small and focused has always been a
goal; I know we talked about this at least once, likely more than
that, while I was still at RH and we were talking offline.

If something is going to be contentious, it is better to be small and
focused on the contention.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 08/13] audit: add containerid support for user records
  2020-07-05 15:11   ` Paul Moore
@ 2020-07-18  0:43     ` Richard Guy Briggs
  2020-08-21 18:34       ` Paul Moore
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-07-18  0:43 UTC (permalink / raw)
  To: Paul Moore
  Cc: nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo,
	netdev, linux-fsdevel, Eric Paris, mpatel, Serge Hallyn

On 2020-07-05 11:11, Paul Moore wrote:
> On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > Add audit container identifier auxiliary record to user event standalone
> > records.
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >  kernel/audit.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 54dd2cb69402..997c34178ee8 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -1507,6 +1504,14 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                                 audit_log_n_untrustedstring(ab, str, data_len);
> >                         }
> >                         audit_log_end(ab);
> > +                       rcu_read_lock();
> > +                       cont = _audit_contobj_get(current);
> > +                       rcu_read_unlock();
> > +                       audit_log_container_id(context, cont);
> > +                       rcu_read_lock();
> > +                       _audit_contobj_put(cont);
> > +                       rcu_read_unlock();
> > +                       audit_free_context(context);
> 
> I haven't searched the entire patchset, but it seems like the pattern
> above happens a couple of times in this patchset, yes?  If so would it
> make sense to wrap the above get/log/put in a helper function?

I've redone the locking with an rcu lock around the get and a spinlock
around the put.  It occurs to me that putting an rcu lock around the
whole thing and doing a get without the refcount increment would save
us the spinlock and put and be fine since we'd be fine with stale but
consistent information traversing the contobj list from this point to
report it.  Problem with that is needing to use GFP_ATOMIC due to the
rcu lock.  If I stick with the spinlock around the put then I can use
GFP_KERNEL and just grab the spinlock while traversing the contobj list.

> Not a big deal either way, I'm pretty neutral on it at this point in
> the patchset but thought it might be worth mentioning in case you
> noticed the same and were on the fence.

There is only one other place this is used, in audit_log_exit in
auditsc.c.  I had noted the pattern but wasn't sure it was worth it.
Inline or not?  Should we just let the compiler decide?

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 10/13] audit: add support for containerid to network namespaces
  2020-07-05 15:11   ` Paul Moore
@ 2020-07-21 22:05     ` Richard Guy Briggs
  0 siblings, 0 replies; 42+ messages in thread
From: Richard Guy Briggs @ 2020-07-21 22:05 UTC (permalink / raw)
  To: Paul Moore
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On 2020-07-05 11:11, Paul Moore wrote:
> On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > This also adds support to qualify NETFILTER_PKT records.
> >
> > Audit events could happen in a network namespace outside of a task
> > context due to packets received from the net that trigger an auditing
> > rule prior to being associated with a running task.  The network
> > namespace could be in use by multiple containers by association to the
> > tasks in that network namespace.  We still want a way to attribute
> > these events to any potential containers.  Keep a list per network
> > namespace to track these audit container identifiiers.
> >
> > Add/increment the audit container identifier on:
> > - initial setting of the audit container identifier via /proc
> > - clone/fork call that inherits an audit container identifier
> > - unshare call that inherits an audit container identifier
> > - setns call that inherits an audit container identifier
> > Delete/decrement the audit container identifier on:
> > - an inherited audit container identifier dropped when child set
> > - process exit
> > - unshare call that drops a net namespace
> > - setns call that drops a net namespace
> >
> > Add audit container identifier auxiliary record(s) to NETFILTER_PKT
> > event standalone records.  Iterate through all potential audit container
> > identifiers associated with a network namespace.
> >
> > Please see the github audit kernel issue for contid net support:
> >   https://github.com/linux-audit/audit-kernel/issues/92
> > Please see the github audit testsuiite issue for the test case:
> >   https://github.com/linux-audit/audit-testsuite/issues/64
> > Please see the github audit wiki for the feature overview:
> >   https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >  include/linux/audit.h    |  20 ++++++
> >  kernel/audit.c           | 156 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  kernel/nsproxy.c         |   4 ++
> >  net/netfilter/nft_log.c  |  11 +++-
> >  net/netfilter/xt_AUDIT.c |  11 +++-
> >  5 files changed, 195 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index c4a755ae0d61..304fbb7c3c5b 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -128,6 +128,13 @@ struct audit_task_info {
> >
> >  extern struct audit_task_info init_struct_audit;
> >
> > +struct audit_contobj_netns {
> > +       struct list_head        list;
> > +       struct audit_contobj    *obj;
> > +       int                     count;
> 
> This seems like it might be a good candidate for refcount_t, yes?

I considered this before when converting the struct audit_contobj to
refcount_t, but decided against it since any updates are in the context
of a list traversal where it could be added to the list and so the
spinlock is already held anyways.

Is there a more efficent or elegant way of doing the locking around the
two list traversals below (_add and _del)?

I wonder about converting the count to refcount_t and only holding the
spinlock for the list_add_rcu() in the _add case.  And for the _del case
holding the spinlock only for the list_del_rcu().

These are the only two locations items are added or deleted from the
lists.

Somewhat related to this is does the list order matter?  Items are
currently added at the end of the list which likely makes locking
simpler, though the start of the list is a simple change.  However,
unless we understand the profile of read use of these lists for
reporting contid use in audit_log_netns_contid_list() I don't think
order matters significantly.  It could be that reporting of a contid
goes down in frequency over the lifetime of a contid that inserting them
at the beginning of the list would be best.  This is not a visible
implementation detail so later optimization should pose no problem.

> > +       struct rcu_head         rcu;
> > +};
> 
> ...
> 
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 997c34178ee8..a862721dfd9b 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -437,6 +452,136 @@ static struct sock *audit_get_sk(const struct net *net)
> >         return aunet->sk;
> >  }
> >
> > +void audit_netns_contid_add(struct net *net, struct audit_contobj *cont)
> > +{
> > +       struct audit_net *aunet;
> > +       struct list_head *contobj_list;
> > +       struct audit_contobj_netns *contns;
> > +
> > +       if (!net)
> > +               return;
> > +       if (!cont)
> > +               return;
> > +       aunet = net_generic(net, audit_net_id);
> > +       if (!aunet)
> > +               return;
> > +       contobj_list = &aunet->contobj_list;
> > +       rcu_read_lock();
> > +       spin_lock(&aunet->contobj_list_lock);
> > +       list_for_each_entry_rcu(contns, contobj_list, list)
> > +               if (contns->obj == cont) {
> > +                       contns->count++;
> > +                       goto out;
> > +               }
> > +       contns = kmalloc(sizeof(*contns), GFP_ATOMIC);
> > +       if (contns) {
> > +               INIT_LIST_HEAD(&contns->list);
> > +               contns->obj = cont;
> > +               contns->count = 1;
> > +               list_add_rcu(&contns->list, contobj_list);
> > +       }
> > +out:
> > +       spin_unlock(&aunet->contobj_list_lock);
> > +       rcu_read_unlock();
> > +}
> > +
> > +void audit_netns_contid_del(struct net *net, struct audit_contobj *cont)
> > +{
> > +       struct audit_net *aunet;
> > +       struct list_head *contobj_list;
> > +       struct audit_contobj_netns *contns = NULL;
> > +
> > +       if (!net)
> > +               return;
> > +       if (!cont)
> > +               return;
> > +       aunet = net_generic(net, audit_net_id);
> > +       if (!aunet)
> > +               return;
> > +       contobj_list = &aunet->contobj_list;
> > +       rcu_read_lock();
> > +       spin_lock(&aunet->contobj_list_lock);
> > +       list_for_each_entry_rcu(contns, contobj_list, list)
> > +               if (contns->obj == cont) {
> > +                       contns->count--;
> > +                       if (contns->count < 1) {
> 
> One could simplify this with "(--countns->count) < 1", although if it
> is changed to a refcount_t (which seems like a smart thing), the
> normal decrement/test would be the best choice.

Agreed.

> > +                               list_del_rcu(&contns->list);
> > +                               kfree_rcu(contns, rcu);
> > +                       }
> > +                       break;
> > +               }
> > +       spin_unlock(&aunet->contobj_list_lock);
> > +       rcu_read_unlock();
> > +}
> 
> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 06/13] audit: add contid support for signalling the audit daemon
  2020-07-05 15:10   ` Paul Moore
@ 2020-07-29 19:00     ` Richard Guy Briggs
  2020-08-21 18:48       ` Paul Moore
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-07-29 19:00 UTC (permalink / raw)
  To: Paul Moore
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On 2020-07-05 11:10, Paul Moore wrote:
> On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > Add audit container identifier support to the action of signalling the
> > audit daemon.
> >
> > Since this would need to add an element to the audit_sig_info struct,
> > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > audit_sig_info2 struct.  Corresponding support is required in the
> > userspace code to reflect the new record request and reply type.
> > An older userspace won't break since it won't know to request this
> > record type.
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> >  include/linux/audit.h       |  8 ++++
> >  include/uapi/linux/audit.h  |  1 +
> >  kernel/audit.c              | 95 ++++++++++++++++++++++++++++++++++++++++++++-
> >  security/selinux/nlmsgtab.c |  1 +
> >  4 files changed, 104 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 5eeba0efffc2..89cf7c66abe6 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -22,6 +22,13 @@ struct audit_sig_info {
> >         char            ctx[];
> >  };
> >
> > +struct audit_sig_info2 {
> > +       uid_t           uid;
> > +       pid_t           pid;
> > +       u32             cid_len;
> > +       char            data[];
> > +};
> > +
> >  struct audit_buffer;
> >  struct audit_context;
> >  struct inode;
> > @@ -105,6 +112,7 @@ struct audit_contobj {
> >         u64                     id;
> >         struct task_struct      *owner;
> >         refcount_t              refcount;
> > +       refcount_t              sigflag;
> >         struct rcu_head         rcu;
> >  };
> 
> It seems like we need some protection in audit_set_contid() so that we
> don't allow reuse of an audit container ID when "refcount == 0 &&
> sigflag != 0", yes?

We have it, see -ESHUTDOWN below.

> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index fd98460c983f..a56ad77069b9 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -72,6 +72,7 @@
> >  #define AUDIT_SET_FEATURE      1018    /* Turn an audit feature on or off */
> >  #define AUDIT_GET_FEATURE      1019    /* Get which features are enabled */
> >  #define AUDIT_CONTAINER_OP     1020    /* Define the container id and info */
> > +#define AUDIT_SIGNAL_INFO2     1021    /* Get info auditd signal sender */
> >
> >  #define AUDIT_FIRST_USER_MSG   1100    /* Userspace messages mostly uninteresting to kernel */
> >  #define AUDIT_USER_AVC         1107    /* We filter this differently */
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index a09f8f661234..54dd2cb69402 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -126,6 +126,8 @@ struct auditd_connection {
> >  kuid_t         audit_sig_uid = INVALID_UID;
> >  pid_t          audit_sig_pid = -1;
> >  u32            audit_sig_sid = 0;
> > +static struct audit_contobj *audit_sig_cid;
> > +static struct task_struct *audit_sig_atsk;
> 
> This looks like a typo, or did you mean "atsk" for some reason?

No, I meant atsk to refer specifically to the audit daemon task and not
any other random one that is doing the signalling.  I can change it is
there is a strong objection.

> >  /* Records can be lost in several ways:
> >     0) [suppressed in audit_alloc]
> > @@ -239,7 +241,33 @@ void _audit_contobj_put(struct audit_contobj *cont)
> >  {
> >         if (!cont)
> >                 return;
> > -       if (refcount_dec_and_test(&cont->refcount)) {
> > +       if (refcount_dec_and_test(&cont->refcount) && !refcount_read(&cont->sigflag)) {
> > +               put_task_struct(cont->owner);
> > +               list_del_rcu(&cont->list);
> > +               kfree_rcu(cont, rcu);
> > +       }
> > +}
> 
> It seems like it might be a good idea to modify the corresponding
> _get() to WARN on the reuse of audit container objects where refcount
> is zero, similar to the comment I made above.  What do you think?

This will never happen.  See -ESHUTDOWN below.

> > +/* rcu_read_lock must be held by caller unless new */
> > +static struct audit_contobj *_audit_contobj_get_sig(struct task_struct *tsk)
> > +{
> > +       struct audit_contobj *cont;
> > +
> > +       if (!tsk->audit)
> > +               return NULL;
> > +       cont = tsk->audit->cont;
> > +       if (cont)
> > +               refcount_set(&cont->sigflag, 1);
> > +       return cont;
> > +}
> 
> If you are going to use a refcount and call this a "get" function you
> might as well make it do an increment and not just a set(1).  It a bit
> silly with just one auditd per system, but I suppose it will make more
> sense when we have multiple audit daemons.  In a related comment, you
> probably want to rename "sigflag" to "sigcount" or similar.

I preferred that previously.  I'll go back to that.

> In summary, it's either a reference that supports multiple gets/puts
> or it's a flag with just an on/off; it shouldn't attempt to straddle
> both, that's both confusing and fragile.

Agreed.  I'll switch it back to refcount.

> > +/* rcu_read_lock must be held by caller */
> > +static void _audit_contobj_put_sig(struct audit_contobj *cont)
> > +{
> > +       if (!cont)
> > +               return;
> > +       refcount_set(&cont->sigflag, 0);
> > +       if (!refcount_read(&cont->refcount)) {
> >                 put_task_struct(cont->owner);
> >                 list_del_rcu(&cont->list);
> >                 kfree_rcu(cont, rcu);
> > @@ -309,6 +337,13 @@ void audit_free(struct task_struct *tsk)
> >         info = tsk->audit;
> >         tsk->audit = NULL;
> >         kmem_cache_free(audit_task_cache, info);
> > +       rcu_read_lock();
> > +       if (audit_sig_atsk == tsk) {
> > +               _audit_contobj_put_sig(audit_sig_cid);
> > +               audit_sig_cid = NULL;
> > +               audit_sig_atsk = NULL;
> > +       }
> > +       rcu_read_unlock();
> >  }
> >
> >  /**
> > @@ -1132,6 +1167,7 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
> >         case AUDIT_ADD_RULE:
> >         case AUDIT_DEL_RULE:
> >         case AUDIT_SIGNAL_INFO:
> > +       case AUDIT_SIGNAL_INFO2:
> >         case AUDIT_TTY_GET:
> >         case AUDIT_TTY_SET:
> >         case AUDIT_TRIM:
> > @@ -1294,6 +1330,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >         struct audit_buffer     *ab;
> >         u16                     msg_type = nlh->nlmsg_type;
> >         struct audit_sig_info   *sig_data;
> > +       struct audit_sig_info2  *sig_data2;
> >         char                    *ctx = NULL;
> >         u32                     len;
> >
> > @@ -1559,6 +1596,52 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                                  sig_data, sizeof(*sig_data) + len);
> >                 kfree(sig_data);
> >                 break;
> > +       case AUDIT_SIGNAL_INFO2: {
> > +               unsigned int contidstrlen = 0;
> > +
> > +               len = 0;
> > +               if (audit_sig_sid) {
> > +                       err = security_secid_to_secctx(audit_sig_sid, &ctx,
> > +                                                      &len);
> > +                       if (err)
> > +                               return err;
> > +               }
> > +               if (audit_sig_cid) {
> > +                       contidstr = kmalloc(21, GFP_KERNEL);
> > +                       if (!contidstr) {
> > +                               if (audit_sig_sid)
> > +                                       security_release_secctx(ctx, len);
> > +                               return -ENOMEM;
> > +                       }
> > +                       contidstrlen = scnprintf(contidstr, 20, "%llu", audit_sig_cid->id);
> > +               }
> > +               sig_data2 = kmalloc(sizeof(*sig_data2) + contidstrlen + len, GFP_KERNEL);
> > +               if (!sig_data2) {
> > +                       if (audit_sig_sid)
> > +                               security_release_secctx(ctx, len);
> > +                       kfree(contidstr);
> > +                       return -ENOMEM;
> > +               }
> > +               sig_data2->uid = from_kuid(&init_user_ns, audit_sig_uid);
> > +               sig_data2->pid = audit_sig_pid;
> > +               if (audit_sig_cid) {
> > +                       memcpy(sig_data2->data, contidstr, contidstrlen);
> > +                       sig_data2->cid_len = contidstrlen;
> > +                       kfree(contidstr);
> > +               }
> > +               if (audit_sig_sid) {
> > +                       memcpy(sig_data2->data + contidstrlen, ctx, len);
> > +                       security_release_secctx(ctx, len);
> > +               }
> > +               rcu_read_lock();
> > +               _audit_contobj_put_sig(audit_sig_cid);
> > +               rcu_read_unlock();
> 
> We probably want to drop the reference in the legacy/AUDIT_SIGNAL_INFO
> case too, right?

Yes, thank you for catching that.  This would be the case of an old
userspace and we don't want that kernel mem/contid leak.

> > +               audit_sig_cid = NULL;
> > +               audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO2, 0, 0,
> > +                                sig_data2, sizeof(*sig_data2) + contidstrlen + len);
> > +               kfree(sig_data2);
> > +               break;
> > +       }
> >         case AUDIT_TTY_GET: {
> >                 struct audit_tty_status s;
> >                 unsigned int t;
> > @@ -2470,6 +2553,11 @@ int audit_signal_info(int sig, struct task_struct *t)
> >                 else
> >                         audit_sig_uid = uid;
> >                 security_task_getsecid(current, &audit_sig_sid);
> > +               rcu_read_lock();
> > +               _audit_contobj_put_sig(audit_sig_cid);
> > +               audit_sig_cid = _audit_contobj_get_sig(current);
> > +               rcu_read_unlock();
> > +               audit_sig_atsk = t;
> >         }
> >
> >         return audit_signal_info_syscall(t);
> > @@ -2532,6 +2620,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> >                         if (cont->id == contid) {
> >                                 /* task injection to existing container */
> >                                 if (current == cont->owner) {
> > +                                       if (!refcount_read(&cont->refcount)) {
> > +                                               rc = -ESHUTDOWN;
> 
> Reuse -ENOTUNIQ; I'm not overly excited about providing a lot of
> detail here as these are global system objects.  If you must have a
> different errno (and I would prefer you didn't), use something like
> -EBUSY.

I don't understand the issue of "global system objects" since the only
time this error would be issued is if its own contid were being reused
but it hadn't cleaned up its own references yet by either issuing an
AUDIT_SIGNAL_INFO* request or the targetted audit daemon hadn't cleaned
up yet.  EBUSY could be confused with already having spawned threads or
children, and ENOTUNIQ could indicate that another orchestrator/engine
had stolen its desired contid after we released it and wanted to reuse
it.  This gets me thinking about making reservations for preferred
contids that are otherwise unavailable and making callbacks to indicate
when they become available, but that seems undesirably complex right
now.

> > +                                               spin_unlock(&audit_contobj_list_lock);
> > +                                               goto conterror;
> > +                                       }
> >                                         _audit_contobj_hold(cont);
> >                                         newcont = cont;
> >                                 } else {
> > diff --git a/security/selinux/nlmsgtab.c b/security/selinux/nlmsgtab.c
> > index b69231918686..8303bb7a63d0 100644
> > --- a/security/selinux/nlmsgtab.c
> > +++ b/security/selinux/nlmsgtab.c
> > @@ -137,6 +137,7 @@ struct nlmsg_perm {
> >         { AUDIT_DEL_RULE,       NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
> >         { AUDIT_USER,           NETLINK_AUDIT_SOCKET__NLMSG_RELAY    },
> >         { AUDIT_SIGNAL_INFO,    NETLINK_AUDIT_SOCKET__NLMSG_READ     },
> > +       { AUDIT_SIGNAL_INFO2,   NETLINK_AUDIT_SOCKET__NLMSG_READ     },
> >         { AUDIT_TRIM,           NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
> >         { AUDIT_MAKE_EQUIV,     NETLINK_AUDIT_SOCKET__NLMSG_WRITE    },
> >         { AUDIT_TTY_GET,        NETLINK_AUDIT_SOCKET__NLMSG_READ     },
> 
> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 05/13] audit: log container info of syscalls
  2020-07-05 15:10   ` Paul Moore
@ 2020-07-29 19:40     ` Richard Guy Briggs
  2020-08-21 19:15       ` Paul Moore
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-07-29 19:40 UTC (permalink / raw)
  To: Paul Moore
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On 2020-07-05 11:10, Paul Moore wrote:
> On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > Create a new audit record AUDIT_CONTAINER_ID to document the audit
> > container identifier of a process if it is present.
> >
> > Called from audit_log_exit(), syscalls are covered.
> >
> > Include target_cid references from ptrace and signal.
> >
> > A sample raw event:
> > type=SYSCALL msg=audit(1519924845.499:257): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56374e1cef30 a2=241 a3=1b6 items=2 ppid=606 pid=635 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="bash" exe="/usr/bin/bash" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="tmpcontainerid"
> > type=CWD msg=audit(1519924845.499:257): cwd="/root"
> > type=PATH msg=audit(1519924845.499:257): item=0 name="/tmp/" inode=13863 dev=00:27 mode=041777 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:tmp_t:s0 nametype= PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> > type=PATH msg=audit(1519924845.499:257): item=1 name="/tmp/tmpcontainerid" inode=17729 dev=00:27 mode=0100644 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0
> > type=PROCTITLE msg=audit(1519924845.499:257): proctitle=62617368002D6300736C65657020313B206563686F2074657374203E202F746D702F746D70636F6E7461696E65726964
> > type=CONTAINER_ID msg=audit(1519924845.499:257): contid=123458
> >
> > Please see the github audit kernel issue for the main feature:
> >   https://github.com/linux-audit/audit-kernel/issues/90
> > Please see the github audit userspace issue for supporting additions:
> >   https://github.com/linux-audit/audit-userspace/issues/51
> > Please see the github audit testsuiite issue for the test case:
> >   https://github.com/linux-audit/audit-testsuite/issues/64
> > Please see the github audit wiki for the feature overview:
> >   https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > Acked-by: Serge Hallyn <serge@hallyn.com>
> > Acked-by: Steve Grubb <sgrubb@redhat.com>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >  include/linux/audit.h      |  7 +++++++
> >  include/uapi/linux/audit.h |  1 +
> >  kernel/audit.c             | 25 +++++++++++++++++++++++--
> >  kernel/audit.h             |  4 ++++
> >  kernel/auditsc.c           | 45 +++++++++++++++++++++++++++++++++++++++------
> >  5 files changed, 74 insertions(+), 8 deletions(-)
> 
> ...
> 
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 9e0b38ce1ead..a09f8f661234 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -2211,6 +2211,27 @@ void audit_log_session_info(struct audit_buffer *ab)
> >         audit_log_format(ab, "auid=%u ses=%u", auid, sessionid);
> >  }
> >
> > +/*
> > + * audit_log_container_id - report container info
> > + * @context: task or local context for record
> > + * @cont: container object to report
> > + */
> > +void audit_log_container_id(struct audit_context *context,
> > +                           struct audit_contobj *cont)
> > +{
> > +       struct audit_buffer *ab;
> > +
> > +       if (!cont)
> > +               return;
> > +       /* Generate AUDIT_CONTAINER_ID record with container ID */
> > +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_ID);
> > +       if (!ab)
> > +               return;
> > +       audit_log_format(ab, "contid=%llu", contid);
> 
> Did this patch compile?  Where is "contid" coming from?  I'm guessing
> you mean to get it from "cont", but that isn't what appears to be
> happening; likely a casualty of the object vs token discussion we had
> during the last review cycle.

Yes, it was supposed to be cont->id.

> I'm assuming this code gets modified later in this patchset and you
> only compiled tested the patchset as a whole.  Please make sure the
> patchset compiles at each patch along the way to applying them all;
> this helps ensure that git bisect remains useful and it fits better
> with the general idea that individual patches must have merit on their
> own.

Yes, agreed.

> ... and yes, I do check for this when merging patchsets, it isn't just
> a visual inspection, I compile test each patch.
> 
> If nothing else, at least this answers the question of if it is worth
> respinning or not (this alone requires a respin).
> 
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index f03d3eb0752c..9e79645e5c0e 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -1458,6 +1466,7 @@ static void audit_log_exit(void)
> >         struct audit_buffer *ab;
> >         struct audit_aux_data *aux;
> >         struct audit_names *n;
> > +       struct audit_contobj *cont;
> >
> >         context->personality = current->personality;
> >
> > @@ -1541,7 +1550,7 @@ static void audit_log_exit(void)
> >         for (aux = context->aux_pids; aux; aux = aux->next) {
> >                 struct audit_aux_data_pids *axs = (void *)aux;
> >
> > -               for (i = 0; i < axs->pid_count; i++)
> > +               for (i = 0; i < axs->pid_count; i++) {
> >                         if (audit_log_pid_context(context, axs->target_pid[i],
> >                                                   axs->target_auid[i],
> >                                                   axs->target_uid[i],
> > @@ -1549,14 +1558,20 @@ static void audit_log_exit(void)
> >                                                   axs->target_sid[i],
> >                                                   axs->target_comm[i]))
> >                                 call_panic = 1;
> > +                       audit_log_container_id(context, axs->target_cid[i]);
> > +               }
> 
> It might be nice to see an audit event example including the
> ptrace/signal information.  I'm concerned there may be some confusion
> about associating the different audit container IDs with the correct
> information in the event.

This is the subject of ghat81, which is a test for ptrace and signal
records.

This was the reason I had advocated for an op= field since there is a
possibility of multiple contid records per event.

> >         }
> >
> > -       if (context->target_pid &&
> > -           audit_log_pid_context(context, context->target_pid,
> > -                                 context->target_auid, context->target_uid,
> > -                                 context->target_sessionid,
> > -                                 context->target_sid, context->target_comm))
> > +       if (context->target_pid) {
> > +               if (audit_log_pid_context(context, context->target_pid,
> > +                                         context->target_auid,
> > +                                         context->target_uid,
> > +                                         context->target_sessionid,
> > +                                         context->target_sid,
> > +                                         context->target_comm))
> >                         call_panic = 1;
> > +               audit_log_container_id(context, context->target_cid);
> > +       }
> >
> >         if (context->pwd.dentry && context->pwd.mnt) {
> >                 ab = audit_log_start(context, GFP_KERNEL, AUDIT_CWD);
> > @@ -1575,6 +1590,14 @@ static void audit_log_exit(void)
> >
> >         audit_log_proctitle();
> >
> > +       rcu_read_lock();
> > +       cont = _audit_contobj_get(current);
> > +       rcu_read_unlock();
> > +       audit_log_container_id(context, cont);
> > +       rcu_read_lock();
> > +       _audit_contobj_put(cont);
> > +       rcu_read_unlock();
> 
> Do we need to grab an additional reference for the audit container
> object here?  We don't create any additional references here that
> persist beyond the lifetime of this function, right?

Why do we need another reference?  There's one for each pointer pointing
to it and so far we have just one from this task.  Or are you thinking
of the contid hash list, which is only added to when a task points to it
and gets removed from that list when the last task stops pointing to it.
Later that gets more complicated with network namespaces and nested
container objects.  For now we just needed it while generating the
record, then it gets freed.

> >         audit_log_container_drop();
> >
> >         /* Send end of event record to help user space know we are finished */
> 
> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 02/13] audit: add container id
  2020-07-05 15:09   ` Paul Moore
@ 2020-07-29 20:05     ` Richard Guy Briggs
  2020-08-21 19:36       ` Paul Moore
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-07-29 20:05 UTC (permalink / raw)
  To: Paul Moore
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On 2020-07-05 11:09, Paul Moore wrote:
> On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> >
> > Implement the proc fs write to set the audit container identifier of a
> > process, emitting an AUDIT_CONTAINER_OP record to document the event.
> >
> > This is a write from the container orchestrator task to a proc entry of
> > the form /proc/PID/audit_containerid where PID is the process ID of the
> > newly created task that is to become the first task in a container, or
> > an additional task added to a container.
> >
> > The write expects up to a u64 value (unset: 18446744073709551615).
> >
> > The writer must have capability CAP_AUDIT_CONTROL.
> >
> > This will produce a record such as this:
> >   type=CONTAINER_OP msg=audit(2018-06-06 12:39:29.636:26949) : op=set opid=2209 contid=123456 old-contid=18446744073709551615
> >
> > The "op" field indicates an initial set.  The "opid" field is the
> > object's PID, the process being "contained".  New and old audit
> > container identifier values are given in the "contid" fields.
> >
> > It is not permitted to unset the audit container identifier.
> > A child inherits its parent's audit container identifier.
> >
> > Store the audit container identifier in a refcounted kernel object that
> > is added to the master list of audit container identifiers.  This will
> > allow multiple container orchestrators/engines to work on the same
> > machine without danger of inadvertantly re-using an existing identifier.
> > It will also allow an orchestrator to inject a process into an existing
> > container by checking if the original container owner is the one
> > injecting the task.  A hash table list is used to optimize searches.
> >
> > Please see the github audit kernel issue for the main feature:
> >   https://github.com/linux-audit/audit-kernel/issues/90
> > Please see the github audit userspace issue for supporting additions:
> >   https://github.com/linux-audit/audit-userspace/issues/51
> > Please see the github audit testsuiite issue for the test case:
> >   https://github.com/linux-audit/audit-testsuite/issues/64
> > Please see the github audit wiki for the feature overview:
> >   https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > Acked-by: Serge Hallyn <serge@hallyn.com>
> > Acked-by: Steve Grubb <sgrubb@redhat.com>
> > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> > ---
> >  fs/proc/base.c             |  36 +++++++++++
> >  include/linux/audit.h      |  33 ++++++++++
> >  include/uapi/linux/audit.h |   2 +
> >  kernel/audit.c             | 148 +++++++++++++++++++++++++++++++++++++++++++++
> >  kernel/audit.h             |   8 +++
> >  5 files changed, 227 insertions(+)
> 
> ...
> 
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index c2150415f9df..2800d4f1a2a8 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -692,6 +715,16 @@ static inline bool audit_loginuid_set(struct task_struct *tsk)
> >         return uid_valid(audit_get_loginuid(tsk));
> >  }
> >
> > +static inline bool audit_contid_valid(u64 contid)
> > +{
> > +       return contid != AUDIT_CID_UNSET;
> > +}
> > +
> > +static inline bool audit_contid_set(struct task_struct *tsk)
> > +{
> > +       return audit_contid_valid(audit_get_contid(tsk));
> > +}
> 
> This is quasi-nitpicky, but it seems like audit_contid_valid() and
> audit_contid_set() should be moved to kernel/audit.h if possible
> (possibly even kernel/audit.c).  Maybe I'll see something later in the
> patchset, but right now I'm struggling to think of why anyone outside
> of audit would need to call these functions.

This was historical made moot by the conversion to contobj.  I moved
them to kernel/audit.c and then just went with an open coded test once
and even just looking at the existance of a contobj.

> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 5d8147a29291..6d387793f702 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -138,6 +138,13 @@ struct auditd_connection {
> >
> >  /* Hash for inode-based rules */
> >  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> > +/* Hash for contid object lists */
> > +struct list_head audit_contid_hash[AUDIT_CONTID_BUCKETS];
> > +/* Lock all additions and deletions to the contid hash lists, assignment
> > + * of container objects to tasks.  There should be no need for
> > + * interaction with tasklist_lock
> > + */
> > +static DEFINE_SPINLOCK(audit_contobj_list_lock);
> >
> >  static struct kmem_cache *audit_buffer_cache;
> >
> > @@ -212,6 +219,33 @@ void __init audit_task_init(void)
> >                                              0, SLAB_PANIC, NULL);
> >  }
> >
> > +/* rcu_read_lock must be held by caller unless new */
> > +static struct audit_contobj *_audit_contobj_hold(struct audit_contobj *cont)
> > +{
> > +       if (cont)
> > +               refcount_inc(&cont->refcount);
> > +       return cont;
> > +}
> > +
> > +static struct audit_contobj *_audit_contobj_get(struct task_struct *tsk)
> > +{
> > +       if (!tsk->audit)
> > +               return NULL;
> > +       return _audit_contobj_hold(tsk->audit->cont);
> > +}
> > +
> > +/* rcu_read_lock must be held by caller */
> > +static void _audit_contobj_put(struct audit_contobj *cont)
> > +{
> > +       if (!cont)
> > +               return;
> > +       if (refcount_dec_and_test(&cont->refcount)) {
> > +               put_task_struct(cont->owner);
> > +               list_del_rcu(&cont->list);
> 
> You should check your locking; I'm used to seeing exclusive locks
> (e.g. the spinlock) around list adds/removes, it just reads/traversals
> that can be done with just the RCU lock held.

Ok, I've redone the locking yet again.  I knew this on one level but
that didn't translate consistently to code...

> > +               kfree_rcu(cont, rcu);
> > +       }
> > +}
> 
> Another nitpick, but it might be nice to have similar arguments to the
> _get() and _put() functions, e.g. struct audit_contobj, but that is
> some serious bikeshedding (basically rename _hold() to _get() and
> rename _hold to audit_task_contid_hold() or similar).

I have some idea what you are trying to say, but I think you misspoke.
Did you mean rename _hold to _get, rename _get to
audit_task_contobj_hold()?

> >  /**
> >   * audit_alloc - allocate an audit info block for a task
> >   * @tsk: task
> > @@ -232,6 +266,9 @@ int audit_alloc(struct task_struct *tsk)
> >         }
> >         info->loginuid = audit_get_loginuid(current);
> >         info->sessionid = audit_get_sessionid(current);
> > +       rcu_read_lock();
> > +       info->cont = _audit_contobj_get(current);
> > +       rcu_read_unlock();
> 
> The RCU locks aren't strictly necessary here, are they?  In fact I
> suppose we could probably just replace the _get() call with a
> refcount_set(1) just as we do in audit_set_contid(), yes?

I don't understand what you are getting at here.  It needs a *contobj,
along with bumping up the refcount of the existing contobj.

> >         tsk->audit = info;
> >
> >         ret = audit_alloc_syscall(tsk);
> > @@ -246,6 +283,7 @@ int audit_alloc(struct task_struct *tsk)
> >  struct audit_task_info init_struct_audit = {
> >         .loginuid = INVALID_UID,
> >         .sessionid = AUDIT_SID_UNSET,
> > +       .cont = NULL,
> >  #ifdef CONFIG_AUDITSYSCALL
> >         .ctx = NULL,
> >  #endif
> > @@ -262,6 +300,9 @@ void audit_free(struct task_struct *tsk)
> >         struct audit_task_info *info = tsk->audit;
> >
> >         audit_free_syscall(tsk);
> > +       rcu_read_lock();
> > +       _audit_contobj_put(tsk->audit->cont);
> > +       rcu_read_unlock();
> >         /* Freeing the audit_task_info struct must be performed after
> >          * audit_log_exit() due to need for loginuid and sessionid.
> >          */
> > @@ -1709,6 +1750,9 @@ static int __init audit_init(void)
> >         for (i = 0; i < AUDIT_INODE_BUCKETS; i++)
> >                 INIT_LIST_HEAD(&audit_inode_hash[i]);
> >
> > +       for (i = 0; i < AUDIT_CONTID_BUCKETS; i++)
> > +               INIT_LIST_HEAD(&audit_contid_hash[i]);
> > +
> >         mutex_init(&audit_cmd_mutex.lock);
> >         audit_cmd_mutex.owner = NULL;
> >
> > @@ -2410,6 +2454,110 @@ int audit_signal_info(int sig, struct task_struct *t)
> >         return audit_signal_info_syscall(t);
> >  }
> >
> > +/*
> > + * audit_set_contid - set current task's audit contid
> > + * @task: target task
> > + * @contid: contid value
> > + *
> > + * Returns 0 on success, -EPERM on permission failure.
> > + *
> > + * If the original container owner goes away, no task injection is
> > + * possible to an existing container.
> > + *
> > + * Called (set) from fs/proc/base.c::proc_contid_write().
> > + */
> > +int audit_set_contid(struct task_struct *task, u64 contid)
> > +{
> > +       int rc = 0;
> > +       struct audit_buffer *ab;
> > +       struct audit_contobj *oldcont = NULL;
> > +
> > +       task_lock(task);
> > +       /* Can't set if audit disabled */
> > +       if (!task->audit) {
> > +               task_unlock(task);
> > +               return -ENOPROTOOPT;
> > +       }
> 
> See my question/comment in patch 1/13; this check may not be needed or
> it may need to be changed to something other than "!task->audit".
> 
> > +       read_lock(&tasklist_lock);
> > +       /* Don't allow the contid to be unset */
> > +       if (!audit_contid_valid(contid)) {
> > +               rc = -EINVAL;
> > +               goto unlock;
> > +       }
> > +       /* if we don't have caps, reject */
> > +       if (!capable(CAP_AUDIT_CONTROL)) {
> > +               rc = -EPERM;
> > +               goto unlock;
> > +       }
> > +       /* if task has children or is not single-threaded, deny */
> > +       if (!list_empty(&task->children) ||
> > +           !(thread_group_leader(task) && thread_group_empty(task))) {
> > +               rc = -EBUSY;
> > +               goto unlock;
> > +       }
> > +       /* if contid is already set, deny */
> > +       if (audit_contid_set(task))
> > +               rc = -EEXIST;
> > +unlock:
> 
> Can we move the "unlock" target to the end of the function where it
> just handles the unlocking and returns an error, including the
> AUDIT_CONTAINER_OP record if necessary?  From what I can see we only
> jump to "unlock" in case of error where we are not going to set the
> audit container ID, yet the "unlock" target is placed in a misleading
> location in the middle of the function.  It may be that everything
> works correctly, but I would argue this is a bad practice that
> increases the likelihood of buggy behavior in future code changes.
> 
> If you can't find way to arrange the code nicely, just duplicate the
> "tasklist_lock" unlock operation in the error handlers before jumping
> down to the end of the function.  It isn't perfect, but I believe it
> will be a lot less fragile than the current approach.

I think it makes most sense to convert it back to an else if ladder that
will simplify things a bit and make if flow a bit better.

> > +       read_unlock(&tasklist_lock);
> > +       rcu_read_lock();
> > +       oldcont = _audit_contobj_get(task);
> > +       if (!rc) {
> > +               struct audit_contobj *cont = NULL, *newcont = NULL;
> > +               int h = audit_hash_contid(contid);
> > +
> > +               spin_lock(&audit_contobj_list_lock);
> > +               list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> > +                       if (cont->id == contid) {
> > +                               /* task injection to existing container */
> > +                               if (current == cont->owner) {
> > +                                       _audit_contobj_hold(cont);
> > +                                       newcont = cont;
> > +                               } else {
> > +                                       rc = -ENOTUNIQ;
> > +                                       spin_unlock(&audit_contobj_list_lock);
> > +                                       goto conterror;
> > +                               }
> > +                               break;
> > +                       }
> > +               if (!newcont) {
> > +                       newcont = kmalloc(sizeof(*newcont), GFP_ATOMIC);
> > +                       if (newcont) {
> > +                               INIT_LIST_HEAD(&newcont->list);
> > +                               newcont->id = contid;
> > +                               newcont->owner = get_task_struct(current);
> > +                               refcount_set(&newcont->refcount, 1);
> > +                               list_add_rcu(&newcont->list,
> > +                                            &audit_contid_hash[h]);
> > +                       } else {
> > +                               rc = -ENOMEM;
> > +                               spin_unlock(&audit_contobj_list_lock);
> > +                               goto conterror;
> > +                       }
> > +               }
> > +               spin_unlock(&audit_contobj_list_lock);
> > +               task->audit->cont = newcont;
> > +               _audit_contobj_put(oldcont);
> > +       }
> > +conterror:
> > +       task_unlock(task);
> > +
> > +       if (!audit_enabled)
> > +               return rc;
> > +
> > +       ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
> > +       if (!ab)
> > +               return rc;
> > +
> > +       audit_log_format(ab,
> > +                        "op=set opid=%d contid=%llu old-contid=%llu",
> > +                        task_tgid_nr(task), contid, oldcont ? oldcont->id : -1);
> > +       _audit_contobj_put(oldcont);
> > +       rcu_read_unlock();
> > +       audit_log_end(ab);
> > +       return rc;
> > +}
> 
> --
> paul moore
> www.paul-moore.com
> 

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 11/13] audit: contid check descendancy and nesting
  2020-07-05 15:11   ` Paul Moore
@ 2020-08-07 17:10     ` Richard Guy Briggs
  2020-08-21 20:13       ` Paul Moore
  0 siblings, 1 reply; 42+ messages in thread
From: Richard Guy Briggs @ 2020-08-07 17:10 UTC (permalink / raw)
  To: Paul Moore
  Cc: nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo,
	netdev, linux-fsdevel, Eric Paris, mpatel, Serge Hallyn, aris

On 2020-07-05 11:11, Paul Moore wrote:
> On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > Require the target task to be a descendant of the container
> > orchestrator/engine.
> >
> > You would only change the audit container ID from one set or inherited
> > value to another if you were nesting containers.
> >
> > If changing the contid, the container orchestrator/engine must be a
> > descendant and not same orchestrator as the one that set it so it is not
> > possible to change the contid of another orchestrator's container.

Are we able to agree on the premises above?  Is anything asserted that
should not be and is there anything missing?

I've been sitting on my response below for more than a week trying to
understand the issues raised and to give it the proper attention to a
reply.  Please excuse my tardiness at replying on this issue since I'm
still having trouble thinking through all the scenarios for nesting.

> > Since the task_is_descendant() function is used in YAMA and in audit,
> > remove the duplication and pull the function into kernel/core/sched.c
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> >  include/linux/sched.h    |  3 +++
> >  kernel/audit.c           | 23 +++++++++++++++++++++--
> >  kernel/sched/core.c      | 33 +++++++++++++++++++++++++++++++++
> >  security/yama/yama_lsm.c | 33 ---------------------------------
> >  4 files changed, 57 insertions(+), 35 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 2213ac670386..06938d0b9e0c 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -2047,4 +2047,7 @@ static inline void rseq_syscall(struct pt_regs *regs)
> >
> >  const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
> >
> > +extern int task_is_descendant(struct task_struct *parent,
> > +                             struct task_struct *child);
> > +
> >  #endif
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index a862721dfd9b..efa65ec01239 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -2713,6 +2713,20 @@ int audit_signal_info(int sig, struct task_struct *t)
> >         return audit_signal_info_syscall(t);
> >  }
> >
> > +static bool audit_contid_isnesting(struct task_struct *tsk)
> > +{
> > +       bool isowner = false;
> > +       bool ownerisparent = false;
> > +
> > +       rcu_read_lock();
> > +       if (tsk->audit && tsk->audit->cont) {
> > +               isowner = current == tsk->audit->cont->owner;
> > +               ownerisparent = task_is_descendant(tsk->audit->cont->owner, current);
> 
> I want to make sure I'm understanding this correctly and I keep
> mentally tripping over something: it seems like for a given audit
> container ID a task is either the owner or a descendent, there is no
> third state, is that correct?

Sure there is.  It could be another owner (which is addressed when we
search for an existing contobj match), or in the next patch, the
owner's parent if nested or a peer.

> Assuming that is true, can the descendent check simply be a negative
> owner check given they both have the same audit container ID?

There isn't actually a check in my code for the orchestrator contid and
task contid being the same.  Maybe I was making this check more
complicated than necessary, and still incomplete, but see below for more...

> > +       }
> > +       rcu_read_unlock();
> > +       return !isowner && ownerisparent;
> > +}
> > +
> >  /*
> >   * audit_set_contid - set current task's audit contid
> >   * @task: target task
> > @@ -2755,8 +2769,13 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> >                 rc = -EBUSY;
> >                 goto unlock;
> >         }
> > -       /* if contid is already set, deny */
> > -       if (audit_contid_set(task))
> > +       /* if task is not descendant, block */
> > +       if (task == current || !task_is_descendant(current, task)) {
> 
> I'm also still fuzzy on why we can't let a task set it's own audit
> container ID, assuming it meets all the criteria established in patch
> 2/13.  It somewhat made sense when you were tracking inherited vs
> explicitly set audit container IDs, but that doesn't appear to be the
> case so far in this patchset, yes?

I'm still having a strong reluctance to permit this but can't come up
with a solid technical reason right now, but it feels like a layer
violation.  If we forbid it and discover it necessary and harmless, then
permitting it won't break the API.  If we permit it and later discover a
reason it causes a problem, then blocking it will break the API.  I have
heard that there are cases where there is no orchestrator/engine, so in
those cases I conclude that a process would need to set its own contid
but I'm having trouble recalling what those circumstances are.

I also was seriously considering blocking any contid set on the initial
user or PID namespaces to avoid polluting them, and even had a tested
patch to implement it, but this starts making assumptions about the
definition of a container with respect to namespaces which we have been
deliberately avoiding.

> > +               rc = -EXDEV;
> 
> I'm fairly confident we had a discussion about not using all these
> different error codes, but that may be a moot point given my next
> comment.

Yes, we did.  I reduced both circumstances down to what you requested,
shedding two along the way.  Given the number of different ways
orchestrators, contids and tasks can be related, I'd rather have more,
not fewer diagnostics to understand what it thinks is happenning.  This
is a realtively minor detail in the context of the rest of the
discussion in this thread.

> > +               goto unlock;
> > +       }
> > +       /* only allow contid setting again if nesting */
> > +       if (audit_contid_set(task) && !audit_contid_isnesting(task))
> >                 rc = -EEXIST;
> 
> It seems like what we need in audit_set_contid() is a check to ensure
> that the task being modified is only modified by the owner of the
> audit container ID, yes?  If so, I would think we could do this quite
> easily with the following, or similar logic, (NOTE: assumes both
> current and tsk are properly setup):
> 
>   if ((current->audit->cont != tsk->audit->cont) || (current->audit->cont->owner != current))
>     return -EACCESS;

Not necessarily.

If we start from the premise that once set, a contid on a task cannot be
unset, and then that it cannot be set to another value, then the oldest
ancestor in any container must not be able to change contid.  That
leaves any descendant (that hasn't threaded or parented) free to nest.

If we allow a task to modify its own contid (from the potential change
above), then if it inherited its contid, it could set its own.  This
still looks like a layer violation to me.  Going back to some
discussions with Eric Biederman from a number of years ago, it seems
wrong to me that a task should be able to see its own contid, let alone
be able to set it.  This came out of a CRIU concern about serial nsIDs
based on proc inode numbers not being portable.  Is it still a
consideration?

Another scenario comes to mind.  Should an orchestrator be able to set
the contid of a descendant of one of the former's child orchestrators?
This doesn't sound like a good idea leaping generations and I can't come
up with a valid use case.

> This is somewhat independent of the above issue, but we may also want
> to add to the capability check.  Patch 2 adds a
> "capable(CAP_AUDIT_CONTROL)" which is good, but perhaps we also need a
> "ns_capable(CAP_AUDIT_CONTROL)" to allow a given audit container ID
> orchestrator/owner the ability to control which of it's descendants
> can change their audit container ID, for example:
> 
>   if (!capable(CAP_AUDIT_CONTROL) ||
>       !ns_capable(current->nsproxy->user_ns, CAP_AUDIT_CONTROL))
>     return -EPERM;

Why does ns_capable keep being raised?  The last patch, capcontid, was
developed to solve this previously raised issue.  The issue was an
unprivileged user creating a user namespace with full capabilities,
circumventing capable() and being able to change the main audit
configuration.  It was already discussed in v8 and before that and my
last posting in the thread was left dangling with an unanswered
question:
https://lkml.org/lkml/2020/2/6/333

I only see this being potentially useful with audit namespaces in
conjunction with unprivileged user namespaces in the future with the
implementation of multiple audit daemons for the ability of an
unprivileged user to run their own distro container without influencing
the master audit configuration.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 08/13] audit: add containerid support for user records
  2020-07-18  0:43     ` Richard Guy Briggs
@ 2020-08-21 18:34       ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-08-21 18:34 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo,
	netdev, linux-fsdevel, Eric Paris, mpatel, Serge Hallyn

On Fri, Jul 17, 2020 at 8:44 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-07-05 11:11, Paul Moore wrote:
> > On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > >
> > > Add audit container identifier auxiliary record to user event standalone
> > > records.
> > >
> > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > Acked-by: Neil Horman <nhorman@tuxdriver.com>
> > > Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>
> > > ---
> > >  kernel/audit.c | 19 ++++++++++++-------
> > >  1 file changed, 12 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index 54dd2cb69402..997c34178ee8 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -1507,6 +1504,14 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > >                                 audit_log_n_untrustedstring(ab, str, data_len);
> > >                         }
> > >                         audit_log_end(ab);
> > > +                       rcu_read_lock();
> > > +                       cont = _audit_contobj_get(current);
> > > +                       rcu_read_unlock();
> > > +                       audit_log_container_id(context, cont);
> > > +                       rcu_read_lock();
> > > +                       _audit_contobj_put(cont);
> > > +                       rcu_read_unlock();
> > > +                       audit_free_context(context);
> >
> > I haven't searched the entire patchset, but it seems like the pattern
> > above happens a couple of times in this patchset, yes?  If so would it
> > make sense to wrap the above get/log/put in a helper function?
>
> I've redone the locking with an rcu lock around the get and a spinlock
> around the put.  It occurs to me that putting an rcu lock around the
> whole thing and doing a get without the refcount increment would save
> us the spinlock and put and be fine since we'd be fine with stale but
> consistent information traversing the contobj list from this point to
> report it.  Problem with that is needing to use GFP_ATOMIC due to the
> rcu lock.  If I stick with the spinlock around the put then I can use
> GFP_KERNEL and just grab the spinlock while traversing the contobj list.
>
> > Not a big deal either way, I'm pretty neutral on it at this point in
> > the patchset but thought it might be worth mentioning in case you
> > noticed the same and were on the fence.
>
> There is only one other place this is used, in audit_log_exit in
> auditsc.c.  I had noted the pattern but wasn't sure it was worth it.
> Inline or not?  Should we just let the compiler decide?

I'm generally not a fan of explicit inlines unless it has been shown
to be a real problem.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 06/13] audit: add contid support for signalling the audit daemon
  2020-07-29 19:00     ` Richard Guy Briggs
@ 2020-08-21 18:48       ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-08-21 18:48 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Wed, Jul 29, 2020 at 3:00 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-07-05 11:10, Paul Moore wrote:
> > On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > >
> > > Add audit container identifier support to the action of signalling the
> > > audit daemon.
> > >
> > > Since this would need to add an element to the audit_sig_info struct,
> > > a new record type AUDIT_SIGNAL_INFO2 was created with a new
> > > audit_sig_info2 struct.  Corresponding support is required in the
> > > userspace code to reflect the new record request and reply type.
> > > An older userspace won't break since it won't know to request this
> > > record type.
> > >
> > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > ---
> > >  include/linux/audit.h       |  8 ++++
> > >  include/uapi/linux/audit.h  |  1 +
> > >  kernel/audit.c              | 95 ++++++++++++++++++++++++++++++++++++++++++++-
> > >  security/selinux/nlmsgtab.c |  1 +
> > >  4 files changed, 104 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > > index 5eeba0efffc2..89cf7c66abe6 100644
> > > --- a/include/linux/audit.h
> > > +++ b/include/linux/audit.h
> > > @@ -22,6 +22,13 @@ struct audit_sig_info {
> > >         char            ctx[];
> > >  };
> > >
> > > +struct audit_sig_info2 {
> > > +       uid_t           uid;
> > > +       pid_t           pid;
> > > +       u32             cid_len;
> > > +       char            data[];
> > > +};
> > > +
> > >  struct audit_buffer;
> > >  struct audit_context;
> > >  struct inode;
> > > @@ -105,6 +112,7 @@ struct audit_contobj {
> > >         u64                     id;
> > >         struct task_struct      *owner;
> > >         refcount_t              refcount;
> > > +       refcount_t              sigflag;
> > >         struct rcu_head         rcu;
> > >  };
> >
> > It seems like we need some protection in audit_set_contid() so that we
> > don't allow reuse of an audit container ID when "refcount == 0 &&
> > sigflag != 0", yes?
>
> We have it, see -ESHUTDOWN below.

That check in audit_set_contid() is checking ->refcount and not
->sigflag; ->sigflag is more important in this context, yes?

> > > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > > index fd98460c983f..a56ad77069b9 100644
> > > --- a/include/uapi/linux/audit.h
> > > +++ b/include/uapi/linux/audit.h
> > > @@ -72,6 +72,7 @@
> > >  #define AUDIT_SET_FEATURE      1018    /* Turn an audit feature on or off */
> > >  #define AUDIT_GET_FEATURE      1019    /* Get which features are enabled */
> > >  #define AUDIT_CONTAINER_OP     1020    /* Define the container id and info */
> > > +#define AUDIT_SIGNAL_INFO2     1021    /* Get info auditd signal sender */
> > >
> > >  #define AUDIT_FIRST_USER_MSG   1100    /* Userspace messages mostly uninteresting to kernel */
> > >  #define AUDIT_USER_AVC         1107    /* We filter this differently */
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index a09f8f661234..54dd2cb69402 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -126,6 +126,8 @@ struct auditd_connection {
> > >  kuid_t         audit_sig_uid = INVALID_UID;
> > >  pid_t          audit_sig_pid = -1;
> > >  u32            audit_sig_sid = 0;
> > > +static struct audit_contobj *audit_sig_cid;
> > > +static struct task_struct *audit_sig_atsk;
> >
> > This looks like a typo, or did you mean "atsk" for some reason?
>
> No, I meant atsk to refer specifically to the audit daemon task and not
> any other random one that is doing the signalling.  I can change it is
> there is a strong objection.

Esh, yeah, "atsk" looks too much like a typo ;)  At the very leask add
a 'd' in there, e.g. "adtsk", but something better than that would be
welcome.

> > > @@ -2532,6 +2620,11 @@ int audit_set_contid(struct task_struct *task, u64 contid)
> > >                         if (cont->id == contid) {
> > >                                 /* task injection to existing container */
> > >                                 if (current == cont->owner) {
> > > +                                       if (!refcount_read(&cont->refcount)) {
> > > +                                               rc = -ESHUTDOWN;
> >
> > Reuse -ENOTUNIQ; I'm not overly excited about providing a lot of
> > detail here as these are global system objects.  If you must have a
> > different errno (and I would prefer you didn't), use something like
> > -EBUSY.
>
> I don't understand the issue of "global system objects" since the only
> time this error would be issued is if its own contid were being reused
> but it hadn't cleaned up its own references yet by either issuing an
> AUDIT_SIGNAL_INFO* request or the targetted audit daemon hadn't cleaned
> up yet.  EBUSY could be confused with already having spawned threads or
> children, and ENOTUNIQ could indicate that another orchestrator/engine
> had stolen its desired contid after we released it and wanted to reuse
> it.

All the more reason for ENOTUNIQ.  The point is that the audit
container ID is not available for use, and since the IDs are shared
across the entire system I think we are better off having some
ambiquity here with errnos.

> This gets me thinking about making reservations for preferred
> contids that are otherwise unavailable and making callbacks to indicate
> when they become available, but that seems undesirably complex right
> now.

That is definitely beyond the scope of this work, or rather *should*
be beyond the scope of this work.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 05/13] audit: log container info of syscalls
  2020-07-29 19:40     ` Richard Guy Briggs
@ 2020-08-21 19:15       ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-08-21 19:15 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Wed, Jul 29, 2020 at 3:41 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-07-05 11:10, Paul Moore wrote:
> > On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:

...

> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index f03d3eb0752c..9e79645e5c0e 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -1458,6 +1466,7 @@ static void audit_log_exit(void)
> > >         struct audit_buffer *ab;
> > >         struct audit_aux_data *aux;
> > >         struct audit_names *n;
> > > +       struct audit_contobj *cont;
> > >
> > >         context->personality = current->personality;
> > >
> > > @@ -1541,7 +1550,7 @@ static void audit_log_exit(void)
> > >         for (aux = context->aux_pids; aux; aux = aux->next) {
> > >                 struct audit_aux_data_pids *axs = (void *)aux;
> > >
> > > -               for (i = 0; i < axs->pid_count; i++)
> > > +               for (i = 0; i < axs->pid_count; i++) {
> > >                         if (audit_log_pid_context(context, axs->target_pid[i],
> > >                                                   axs->target_auid[i],
> > >                                                   axs->target_uid[i],
> > > @@ -1549,14 +1558,20 @@ static void audit_log_exit(void)
> > >                                                   axs->target_sid[i],
> > >                                                   axs->target_comm[i]))
> > >                                 call_panic = 1;
> > > +                       audit_log_container_id(context, axs->target_cid[i]);
> > > +               }
> >
> > It might be nice to see an audit event example including the
> > ptrace/signal information.  I'm concerned there may be some confusion
> > about associating the different audit container IDs with the correct
> > information in the event.
>
> This is the subject of ghat81, which is a test for ptrace and signal
> records.
>
> This was the reason I had advocated for an op= field since there is a
> possibility of multiple contid records per event.

I think an "op=" field is the wrong way to link audit container ID to
a particular record.  It may be convenient, but I fear that it would
be overloading the field too much.

Like I said above, I think it would be good to see an audit event
example including the ptrace/signal information.  This way we can talk
about it on-list and hash out the various solutions if it proves to be
a problem.

> > > @@ -1575,6 +1590,14 @@ static void audit_log_exit(void)
> > >
> > >         audit_log_proctitle();
> > >
> > > +       rcu_read_lock();
> > > +       cont = _audit_contobj_get(current);
> > > +       rcu_read_unlock();
> > > +       audit_log_container_id(context, cont);
> > > +       rcu_read_lock();
> > > +       _audit_contobj_put(cont);
> > > +       rcu_read_unlock();
> >
> > Do we need to grab an additional reference for the audit container
> > object here?  We don't create any additional references here that
> > persist beyond the lifetime of this function, right?
>
> Why do we need another reference?  There's one for each pointer pointing
> to it and so far we have just one from this task.  Or are you thinking
> of the contid hash list, which is only added to when a task points to it
> and gets removed from that list when the last task stops pointing to it.
> Later that gets more complicated with network namespaces and nested
> container objects.  For now we just needed it while generating the
> record, then it gets freed.

I don't think we need to grab an additional reference here, that is
why I asked the question.  The code above grabs a reference for the
audit container ID object associated with the current task and then
drops it before returning; if the current task, and it's associated
audit container ID object, disappears in the middle of the function
we've got much bigger worries :)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 02/13] audit: add container id
  2020-07-29 20:05     ` Richard Guy Briggs
@ 2020-08-21 19:36       ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-08-21 19:36 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: containers, linux-api, Linux-Audit Mailing List, linux-fsdevel,
	LKML, netdev, netfilter-devel, sgrubb, Ondrej Mosnacek, dhowells,
	simo, Eric Paris, Serge Hallyn, ebiederm, nhorman, Dan Walsh,
	mpatel

On Wed, Jul 29, 2020 at 4:06 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-07-05 11:09, Paul Moore wrote:
> > On Sat, Jun 27, 2020 at 9:22 AM Richard Guy Briggs <rgb@redhat.com> wrote:

...

> > > @@ -212,6 +219,33 @@ void __init audit_task_init(void)
> > >                                              0, SLAB_PANIC, NULL);
> > >  }
> > >
> > > +/* rcu_read_lock must be held by caller unless new */
> > > +static struct audit_contobj *_audit_contobj_hold(struct audit_contobj *cont)
> > > +{
> > > +       if (cont)
> > > +               refcount_inc(&cont->refcount);
> > > +       return cont;
> > > +}
> > > +
> > > +static struct audit_contobj *_audit_contobj_get(struct task_struct *tsk)
> > > +{
> > > +       if (!tsk->audit)
> > > +               return NULL;
> > > +       return _audit_contobj_hold(tsk->audit->cont);
> > > +}
> > > +
> > > +/* rcu_read_lock must be held by caller */
> > > +static void _audit_contobj_put(struct audit_contobj *cont)
> > > +{
> > > +       if (!cont)
> > > +               return;
> > > +       if (refcount_dec_and_test(&cont->refcount)) {
> > > +               put_task_struct(cont->owner);
> > > +               list_del_rcu(&cont->list);
> >
> > You should check your locking; I'm used to seeing exclusive locks
> > (e.g. the spinlock) around list adds/removes, it just reads/traversals
> > that can be done with just the RCU lock held.
>
> Ok, I've redone the locking yet again.  I knew this on one level but
> that didn't translate consistently to code...
>
> > > +               kfree_rcu(cont, rcu);
> > > +       }
> > > +}
> >
> > Another nitpick, but it might be nice to have similar arguments to the
> > _get() and _put() functions, e.g. struct audit_contobj, but that is
> > some serious bikeshedding (basically rename _hold() to _get() and
> > rename _hold to audit_task_contid_hold() or similar).
>
> I have some idea what you are trying to say, but I think you misspoke.
> Did you mean rename _hold to _get, rename _get to
> audit_task_contobj_hold()?

It reads okay to me, but I know what I'm intending here :)  I agree it
could be a bit confusing.  Let me try to put my suggestion into some
quick pseudo-code function prototypes to make things a bit more
concrete.

The _audit_contobj_hold() function would become:
   struct audit_contobj *_audit_contobj_hold(struct task_struct *tsk);

The _audit_contobj_get() function would become:
   struct audit_contobj *_audit_contobj_get(struct audit_contobj *cont);

The _audit_contobj_put() function would become:
   void _audit_contobj_put(struct audit_contobj *cont);

Basically swap the _get() and _hold() function names so that the
arguments are the same for both the _get() and _set() functions.  Does
this make more sense?

> > >  /**
> > >   * audit_alloc - allocate an audit info block for a task
> > >   * @tsk: task
> > > @@ -232,6 +266,9 @@ int audit_alloc(struct task_struct *tsk)
> > >         }
> > >         info->loginuid = audit_get_loginuid(current);
> > >         info->sessionid = audit_get_sessionid(current);
> > > +       rcu_read_lock();
> > > +       info->cont = _audit_contobj_get(current);
> > > +       rcu_read_unlock();
> >
> > The RCU locks aren't strictly necessary here, are they?  In fact I
> > suppose we could probably just replace the _get() call with a
> > refcount_set(1) just as we do in audit_set_contid(), yes?
>
> I don't understand what you are getting at here.  It needs a *contobj,
> along with bumping up the refcount of the existing contobj.

Sorry, you can disregard.  My mental definition for audit_alloc() is
permanently messed up; I usually double check myself before commenting
on related code, but I must have forgotten here.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH ghak90 V9 11/13] audit: contid check descendancy and nesting
  2020-08-07 17:10     ` Richard Guy Briggs
@ 2020-08-21 20:13       ` Paul Moore
  0 siblings, 0 replies; 42+ messages in thread
From: Paul Moore @ 2020-08-21 20:13 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: nhorman, linux-api, containers, LKML, dhowells,
	Linux-Audit Mailing List, netfilter-devel, ebiederm, simo,
	netdev, linux-fsdevel, Eric Paris, mpatel, Serge Hallyn, aris

On Fri, Aug 7, 2020 at 1:10 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-07-05 11:11, Paul Moore wrote:
> > On Sat, Jun 27, 2020 at 9:23 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > Require the target task to be a descendant of the container
> > > orchestrator/engine.

If you want to get formal about this, you need to define "target" in
the sentence above.  Target of what?

FWIW, I read the above to basically mean that a task can only set the
audit container ID of processes which are beneath it in the "process
tree" where the "process tree" is defined as the relationship between
a parent and children processes such that the children processes are
branches below the parent process.

I have no problem with that, with the understanding that nesting
complicates it somewhat.  For example, this isn't true when one of the
children is a nested orchestrator, is it?

> > > You would only change the audit container ID from one set or inherited
> > > value to another if you were nesting containers.

I thought we decided we were going to allow an orchestrator to move a
process between audit container IDs, yes?  no?

> > > If changing the contid, the container orchestrator/engine must be a
> > > descendant and not same orchestrator as the one that set it so it is not
> > > possible to change the contid of another orchestrator's container.

Try rephrasing the above please, it isn't clear to me what you are
trying to say.

> Are we able to agree on the premises above?  Is anything asserted that
> should not be and is there anything missing?

See above.

If you want to go back to the definitions/assumptions stage, it
probably isn't worth worrying about the other comments until we get
the above sorted.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, back to index

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-27 13:20 [PATCH ghak90 V9 00/13] audit: implement container identifier Richard Guy Briggs
2020-06-27 13:20 ` [PATCH ghak90 V9 01/13] audit: collect audit task parameters Richard Guy Briggs
2020-07-05 15:09   ` Paul Moore
2020-07-07  2:50     ` Richard Guy Briggs
2020-07-08  1:42       ` Paul Moore
2020-07-13 20:29         ` Richard Guy Briggs
2020-07-14  0:44           ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 02/13] audit: add container id Richard Guy Briggs
2020-07-04 13:29   ` Paul Moore
2020-07-04 13:30     ` Paul Moore
2020-07-05 15:09   ` Paul Moore
2020-07-29 20:05     ` Richard Guy Briggs
2020-08-21 19:36       ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 03/13] audit: read container ID of a process Richard Guy Briggs
2020-06-27 13:20 ` [PATCH ghak90 V9 04/13] audit: log drop of contid on exit of last task Richard Guy Briggs
2020-07-05 15:10   ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 05/13] audit: log container info of syscalls Richard Guy Briggs
2020-07-05 15:10   ` Paul Moore
2020-07-29 19:40     ` Richard Guy Briggs
2020-08-21 19:15       ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 06/13] audit: add contid support for signalling the audit daemon Richard Guy Briggs
2020-07-05 15:10   ` Paul Moore
2020-07-29 19:00     ` Richard Guy Briggs
2020-08-21 18:48       ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 07/13] audit: add support for non-syscall auxiliary records Richard Guy Briggs
2020-07-05 15:11   ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 08/13] audit: add containerid support for user records Richard Guy Briggs
2020-07-05 15:11   ` Paul Moore
2020-07-18  0:43     ` Richard Guy Briggs
2020-08-21 18:34       ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 09/13] audit: add containerid filtering Richard Guy Briggs
2020-06-27 13:20 ` [PATCH ghak90 V9 10/13] audit: add support for containerid to network namespaces Richard Guy Briggs
2020-07-05 15:11   ` Paul Moore
2020-07-21 22:05     ` Richard Guy Briggs
2020-06-27 13:20 ` [PATCH ghak90 V9 11/13] audit: contid check descendancy and nesting Richard Guy Briggs
2020-07-05 15:11   ` Paul Moore
2020-08-07 17:10     ` Richard Guy Briggs
2020-08-21 20:13       ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 12/13] audit: track container nesting Richard Guy Briggs
2020-07-05 15:11   ` Paul Moore
2020-06-27 13:20 ` [PATCH ghak90 V9 13/13] audit: add capcontid to set contid outside init_user_ns Richard Guy Briggs
2020-07-05 15:11   ` Paul Moore

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git