Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Paul Moore <paul@paul-moore.com>
To: Richard Guy Briggs <rgb@redhat.com>
Cc: containers@lists.linux-foundation.org, linux-api@vger.kernel.org,
	Linux-Audit Mailing List <linux-audit@redhat.com>,
	linux-fsdevel@vger.kernel.org,
	LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, netfilter-devel@vger.kernel.org,
	sgrubb@redhat.com, omosnace@redhat.com, dhowells@redhat.com,
	simo@redhat.com, Eric Paris <eparis@parisplace.org>,
	Serge Hallyn <serge@hallyn.com>,
	ebiederm@xmission.com, nhorman@tuxdriver.com,
	Dan Walsh <dwalsh@redhat.com>,
	mpatel@redhat.com
Subject: Re: [PATCH ghak90 V8 04/16] audit: convert to contid list to check for orch/engine ownership
Date: Wed, 22 Jan 2020 16:28:16 -0500
Message-ID: <CAHC9VhRRW2fFcgBs-R_BZ7ZWCP5wsXA9DB1RUM=QeKj2xZkS2Q@mail.gmail.com> (raw)
In-Reply-To: <a911acf0b209c05dc156fb6b57f9da45778747ce.1577736799.git.rgb@redhat.com>

On Tue, Dec 31, 2019 at 2:50 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>
> Store the audit container identifier in a refcounted kernel object that
> is added to the master list of audit container identifiers.  This will
> allow multiple container orchestrators/engines to work on the same
> machine without danger of inadvertantly re-using an existing identifier.
> It will also allow an orchestrator to inject a process into an existing
> container by checking if the original container owner is the one
> injecting the task.  A hash table list is used to optimize searches.
>
> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> ---
>  include/linux/audit.h | 14 ++++++--
>  kernel/audit.c        | 98 ++++++++++++++++++++++++++++++++++++++++++++++++---
>  kernel/audit.h        |  8 +++++
>  3 files changed, 112 insertions(+), 8 deletions(-)

...

> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index a045b34ecf44..0e6dbe943ae4 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -94,10 +94,18 @@ struct audit_ntp_data {
>  struct audit_ntp_data {};
>  #endif
>
> +struct audit_contobj {
> +       struct list_head        list;
> +       u64                     id;
> +       struct task_struct      *owner;
> +       refcount_t              refcount;
> +       struct rcu_head         rcu;
> +};
> +
>  struct audit_task_info {
>         kuid_t                  loginuid;
>         unsigned int            sessionid;
> -       u64                     contid;
> +       struct audit_contobj    *cont;
>  #ifdef CONFIG_AUDITSYSCALL
>         struct audit_context    *ctx;
>  #endif
> @@ -203,9 +211,9 @@ static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
>
>  static inline u64 audit_get_contid(struct task_struct *tsk)
>  {
> -       if (!tsk->audit)
> +       if (!tsk->audit || !tsk->audit->cont)
>                 return AUDIT_CID_UNSET;
> -       return tsk->audit->contid;
> +       return tsk->audit->cont->id;
>  }
>
>  extern u32 audit_enabled;
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 2d7707426b7d..4bab20f5f781 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -212,6 +218,31 @@ void __init audit_task_init(void)
>                                              0, SLAB_PANIC, NULL);
>  }
>
> +static struct audit_contobj *_audit_contobj(struct task_struct *tsk)
> +{
> +       if (!tsk->audit)
> +               return NULL;
> +       return tsk->audit->cont;

It seems like it would be safer to grab a reference here (e.g.
_audit_contobj_hold(...)), yes?  Or are you confident we will never
have tsk go away while the caller is holding on to the returned audit
container ID object?

> +}
> +
> +/* audit_contobj_list_lock must be held by caller unless new */
> +static void _audit_contobj_hold(struct audit_contobj *cont)
> +{
> +       refcount_inc(&cont->refcount);
> +}

If we are going to call the matching decrement function "_put" it
seems like we might want to call the function about "_get".  Further,
we can also have it return an audit_contobj pointer in case the caller
needs to do an assignment as well (which seems typical if you need to
bump the refcount):

  _audit_contobj_get(audit_contobj *cont)
  {
    if (cont)
      refcount_inc(cont);
    return cont;
  }

> +/* audit_contobj_list_lock must be held by caller */
> +static void _audit_contobj_put(struct audit_contobj *cont)
> +{
> +       if (!cont)
> +               return;
> +       if (refcount_dec_and_test(&cont->refcount)) {
> +               put_task_struct(cont->owner);
> +               list_del_rcu(&cont->list);
> +               kfree_rcu(cont, rcu);
> +       }
> +}
> +
>  /**
>   * audit_alloc - allocate an audit info block for a task
>   * @tsk: task
> @@ -232,7 +263,11 @@ int audit_alloc(struct task_struct *tsk)
>         }
>         info->loginuid = audit_get_loginuid(current);
>         info->sessionid = audit_get_sessionid(current);
> -       info->contid = audit_get_contid(current);
> +       spin_lock(&audit_contobj_list_lock);
> +       info->cont = _audit_contobj(current);
> +       if (info->cont)
> +               _audit_contobj_hold(info->cont);
> +       spin_unlock(&audit_contobj_list_lock);

If we are taking a spinlock in order to bump the refcount, does it
really need to be a refcount_t or can we just use a normal integer?
In RCU protected lists a spinlock is usually used to protect
adds/removes to the list, not the content of individual list items.

My guess is you probably want to use the spinlock as described above
(list add/remove protection) and manipulate the refcount_t inside a
RCU read lock protected region.

>         tsk->audit = info;
>
>         ret = audit_alloc_syscall(tsk);
> @@ -267,6 +302,9 @@ void audit_free(struct task_struct *tsk)
>         /* Freeing the audit_task_info struct must be performed after
>          * audit_log_exit() due to need for loginuid and sessionid.
>          */
> +       spin_lock(&audit_contobj_list_lock);
> +       _audit_contobj_put(tsk->audit->cont);
> +       spin_unlock(&audit_contobj_list_lock);

This is another case of refcount_t vs normal integer in a spinlock
protected region.

>         info = tsk->audit;
>         tsk->audit = NULL;
>         kmem_cache_free(audit_task_cache, info);
> @@ -2365,6 +2406,9 @@ int audit_signal_info(int sig, struct task_struct *t)
>   *
>   * Returns 0 on success, -EPERM on permission failure.
>   *
> + * If the original container owner goes away, no task injection is
> + * possible to an existing container.
> + *
>   * Called (set) from fs/proc/base.c::proc_contid_write().
>   */
>  int audit_set_contid(struct task_struct *task, u64 contid)
> @@ -2381,9 +2425,12 @@ int audit_set_contid(struct task_struct *task, u64 contid)
>         }
>         oldcontid = audit_get_contid(task);
>         read_lock(&tasklist_lock);
> -       /* Don't allow the audit containerid to be unset */
> +       /* Don't allow the contid to be unset */
>         if (!audit_contid_valid(contid))
>                 rc = -EINVAL;
> +       /* Don't allow the contid to be set to the same value again */
> +       else if (contid == oldcontid) {
> +               rc = -EADDRINUSE;

First, is that brace a typo?  It looks like it.  Did this compile?

Second, can you explain why (re)setting the audit container ID to the
same value is something we need to prohibit?  I'm guessing it has
something to do with explicitly set vs inherited, but I don't want to
assume too much about your thinking behind this.

If it is "set vs inherited", would allowing an orchestrator to
explicitly "set" an inherited audit container ID provide some level or
protection against moving the task?

>         /* if we don't have caps, reject */
>         else if (!capable(CAP_AUDIT_CONTROL))
>                 rc = -EPERM;
> @@ -2396,8 +2443,49 @@ int audit_set_contid(struct task_struct *task, u64 contid)
>         else if (audit_contid_set(task))
>                 rc = -ECHILD;
>         read_unlock(&tasklist_lock);
> -       if (!rc)
> -               task->audit->contid = contid;
> +       if (!rc) {
> +               struct audit_contobj *oldcont = _audit_contobj(task);
> +               struct audit_contobj *cont = NULL, *newcont = NULL;
> +               int h = audit_hash_contid(contid);
> +
> +               rcu_read_lock();
> +               list_for_each_entry_rcu(cont, &audit_contid_hash[h], list)
> +                       if (cont->id == contid) {
> +                               /* task injection to existing container */
> +                               if (current == cont->owner) {

Do we have any protection against the task pointed to by cont->owner
going away and a new task with the same current pointer value (no
longer the legitimate audit container ID owner) manipulating the
target task's audit container ID?

> +                                       spin_lock(&audit_contobj_list_lock);
> +                                       _audit_contobj_hold(cont);
> +                                       spin_unlock(&audit_contobj_list_lock);

More of the recount_t vs integer/spinlock question.

> +                                       newcont = cont;
> +                               } else {
> +                                       rc = -ENOTUNIQ;
> +                                       goto conterror;
> +                               }
> +                               break;
> +                       }
> +               if (!newcont) {
> +                       newcont = kmalloc(sizeof(*newcont), GFP_ATOMIC);
> +                       if (newcont) {
> +                               INIT_LIST_HEAD(&newcont->list);
> +                               newcont->id = contid;
> +                               get_task_struct(current);
> +                               newcont->owner = current;

newcont->owner = get_task_struct(current);

(This is what I was talking about above with returning the struct
pointer in the _get/_hold function)

> +                               refcount_set(&newcont->refcount, 1);
> +                               spin_lock(&audit_contobj_list_lock);
> +                               list_add_rcu(&newcont->list, &audit_contid_hash[h]);
> +                               spin_unlock(&audit_contobj_list_lock);

I think we might have a problem where multiple tasks could race adding
the same audit container ID and since there is no check inside the
spinlock protected region we could end up adding multiple instances of
the same audit container ID, yes?

> +                       } else {
> +                               rc = -ENOMEM;
> +                               goto conterror;
> +                       }
> +               }
> +               task->audit->cont = newcont;
> +               spin_lock(&audit_contobj_list_lock);
> +               _audit_contobj_put(oldcont);
> +               spin_unlock(&audit_contobj_list_lock);

More of the refcount_t/integer/spinlock question.


> +conterror:
> +               rcu_read_unlock();
> +       }
>         task_unlock(task);
>
>         if (!audit_enabled)

--
paul moore
www.paul-moore.com

  reply index

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-31 19:48 [PATCH ghak90 V8 00/16] audit: implement container identifier Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 01/16] audit: collect audit task parameters Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 02/16] audit: add container id Richard Guy Briggs
2020-01-22 21:28   ` Paul Moore
2020-01-30 17:53     ` Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 03/16] audit: read container ID of a process Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 04/16] audit: convert to contid list to check for orch/engine ownership Richard Guy Briggs
2020-01-22 21:28   ` Paul Moore [this message]
2020-02-04 22:51     ` Richard Guy Briggs
2020-02-05 22:40       ` Paul Moore
2019-12-31 19:48 ` [PATCH ghak90 V8 05/16] audit: log drop of contid on exit of last task Richard Guy Briggs
2020-01-22 21:28   ` Paul Moore
2020-02-04 23:02     ` Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 06/16] audit: log container info of syscalls Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 07/16] audit: add contid support for signalling the audit daemon Richard Guy Briggs
2020-01-22 21:28   ` Paul Moore
2020-01-23 16:29     ` Richard Guy Briggs
2020-01-23 17:09       ` Paul Moore
2020-01-23 20:04         ` Richard Guy Briggs
2020-01-23 21:35           ` Paul Moore
2020-02-04 23:14             ` Richard Guy Briggs
2020-02-05 22:50               ` Paul Moore
2020-02-12 22:38                 ` Steve Grubb
2020-02-13  0:09                   ` Paul Moore
2020-02-13 21:44                     ` Paul Moore
2020-03-12 19:30                       ` Richard Guy Briggs
2020-03-13 16:29                         ` Paul Moore
2020-03-13 18:59                           ` Richard Guy Briggs
2020-03-18 20:56                             ` Paul Moore
2020-03-18 21:26                               ` Richard Guy Briggs
2020-03-18 21:42                                 ` Paul Moore
2020-03-18 21:55                                   ` Richard Guy Briggs
2020-03-18 22:06                                     ` Paul Moore
2020-03-19 22:02                                       ` Richard Guy Briggs
2020-03-24  0:16                                         ` Paul Moore
2020-03-24 21:01                                           ` Richard Guy Briggs
2020-03-29  3:11                                             ` Paul Moore
2020-03-30 13:47                                               ` Richard Guy Briggs
2020-03-30 14:26                                                 ` Paul Moore
2020-03-30 16:21                                                   ` Richard Guy Briggs
2020-03-30 17:34                                                     ` Paul Moore
2020-03-30 17:49                                                       ` Richard Guy Briggs
2020-03-30 19:55                                                         ` Paul Moore
2020-04-16 20:33                                                           ` Eric W. Biederman
2020-04-16 21:53                                                             ` Paul Moore
2020-04-17 22:23                                                               ` Eric W. Biederman
2020-04-22 17:24                                                                 ` Paul Moore
2020-06-08 18:03                                                                   ` Richard Guy Briggs
2020-06-17 21:33                                                                     ` Paul Moore
2020-06-19 15:22                                                                 ` Richard Guy Briggs
2020-03-12 20:27                     ` Richard Guy Briggs
2020-03-13 16:42                       ` Paul Moore
2020-03-13 16:45                         ` Steve Grubb
2020-03-13 16:49                           ` Paul Moore
2020-03-13 19:23                         ` Richard Guy Briggs
2020-03-18 21:01                           ` Paul Moore
2020-03-18 21:41                             ` Richard Guy Briggs
2020-03-18 21:47                               ` Paul Moore
2020-03-19 21:47                                 ` Richard Guy Briggs
2020-03-20 21:56                                   ` Paul Moore
2020-03-25 12:29                                     ` Richard Guy Briggs
2020-03-29  3:17                                       ` Paul Moore
2020-03-30 15:23                                         ` Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 08/16] audit: add support for non-syscall auxiliary records Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 09/16] audit: add containerid support for user records Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 10/16] audit: add containerid filtering Richard Guy Briggs
2019-12-31 19:48 ` [PATCH ghak90 V8 11/16] audit: add support for containerid to network namespaces Richard Guy Briggs
2020-01-22 21:28   ` Paul Moore
2020-02-04 23:42     ` Richard Guy Briggs
2020-02-05 22:51       ` Paul Moore
2019-12-31 19:48 ` [PATCH ghak90 V8 12/16] audit: contid check descendancy and nesting Richard Guy Briggs
2020-01-22 21:29   ` Paul Moore
2020-01-23 21:02     ` Richard Guy Briggs
2020-01-23 21:47       ` Paul Moore
2019-12-31 19:48 ` [PATCH ghak90 V8 13/16] audit: track container nesting Richard Guy Briggs
2020-01-22 21:29   ` Paul Moore
2020-01-30 19:27     ` Richard Guy Briggs
2020-02-05 23:05       ` Paul Moore
2020-02-05 23:50         ` Richard Guy Briggs
2020-02-13 21:49           ` Paul Moore
2020-03-12 20:51             ` Richard Guy Briggs
2020-03-13 16:47               ` Paul Moore
2020-03-14 22:42                 ` Richard Guy Briggs
2020-03-17 18:28                   ` Richard Guy Briggs
2020-03-18 21:08                   ` Paul Moore
2020-01-31 14:50     ` Steve Grubb
2020-02-04 13:19       ` Richard Guy Briggs
2020-02-04 15:47         ` Steve Grubb
2020-02-04 15:52           ` Paul Moore
2020-02-04 18:12             ` Steve Grubb
2020-02-05 22:57               ` Paul Moore
2019-12-31 19:48 ` [PATCH ghak90 V8 14/16] audit: check contid depth and add limit config param Richard Guy Briggs
2020-01-22 21:29   ` Paul Moore
2019-12-31 19:48 ` [PATCH ghak90 V8 15/16] audit: check contid count per netns and add config param limit Richard Guy Briggs
2020-01-22 21:29   ` Paul Moore
2019-12-31 19:48 ` [PATCH ghak90 V8 16/16] audit: add capcontid to set contid outside init_user_ns Richard Guy Briggs
2020-01-22 21:29   ` Paul Moore
2020-02-05  0:39     ` Richard Guy Briggs
2020-02-05 22:56       ` Paul Moore
2020-02-06 12:51         ` Richard Guy Briggs
2020-02-13 21:58           ` Paul Moore
2020-03-12 21:58             ` Richard Guy Briggs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHC9VhRRW2fFcgBs-R_BZ7ZWCP5wsXA9DB1RUM=QeKj2xZkS2Q@mail.gmail.com' \
    --to=paul@paul-moore.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=dwalsh@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=eparis@parisplace.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-audit@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatel@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=omosnace@redhat.com \
    --cc=rgb@redhat.com \
    --cc=serge@hallyn.com \
    --cc=sgrubb@redhat.com \
    --cc=simo@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git