From: Paul Moore <paul@paul-moore.com>
To: Richard Guy Briggs <rgb@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>,
linux-api@vger.kernel.org, containers@lists.linux-foundation.org,
LKML <linux-kernel@vger.kernel.org>,
dhowells@redhat.com,
Linux-Audit Mailing List <linux-audit@redhat.com>,
netfilter-devel@vger.kernel.org, ebiederm@xmission.com,
simo@redhat.com, netdev@vger.kernel.org,
linux-fsdevel@vger.kernel.org, Eric Paris <eparis@parisplace.org>,
Serge Hallyn <serge@hallyn.com>
Subject: Re: [PATCH ghak90 V5 09/10] audit: add support for containerid to network namespaces
Date: Thu, 4 Apr 2019 22:06:10 -0400 [thread overview]
Message-ID: <CAHC9VhSf_7RH96dNhVhifs3uqEZf0Uq=Qi5TgqdVtUskiqYQ4Q@mail.gmail.com> (raw)
In-Reply-To: <20190404214006.jcgmjb4u6iobu42s@madcap2.tricolour.ca>
On Thu, Apr 4, 2019 at 5:40 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2019-04-02 07:31, Neil Horman wrote:
> > On Mon, Apr 01, 2019 at 10:50:03AM -0400, Paul Moore wrote:
> > > On Fri, Mar 15, 2019 at 2:35 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > > Audit events could happen in a network namespace outside of a task
> > > > context due to packets received from the net that trigger an auditing
> > > > rule prior to being associated with a running task. The network
> > > > namespace could be in use by multiple containers by association to the
> > > > tasks in that network namespace. We still want a way to attribute
> > > > these events to any potential containers. Keep a list per network
> > > > namespace to track these audit container identifiiers.
> > > >
> > > > Add/increment the audit container identifier on:
> > > > - initial setting of the audit container identifier via /proc
> > > > - clone/fork call that inherits an audit container identifier
> > > > - unshare call that inherits an audit container identifier
> > > > - setns call that inherits an audit container identifier
> > > > Delete/decrement the audit container identifier on:
> > > > - an inherited audit container identifier dropped when child set
> > > > - process exit
> > > > - unshare call that drops a net namespace
> > > > - setns call that drops a net namespace
> > > >
> > > > See: https://github.com/linux-audit/audit-kernel/issues/92
> > > > See: https://github.com/linux-audit/audit-testsuite/issues/64
> > > > See: https://github.com/linux-audit/audit-kernel/wiki/RFE-Audit-Container-ID
> > > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > > ---
> > > > include/linux/audit.h | 19 ++++++++++++
> > > > kernel/audit.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++--
> > > > kernel/nsproxy.c | 4 +++
> > > > 3 files changed, 106 insertions(+), 3 deletions(-)
...
> > > > + list_for_each_entry_rcu(cont, contid_list, list)
> > > > + if (cont->id == contid) {
> > > > + refcount_inc(&cont->refcount);
> > > > + goto out;
> > > > + }
> > > > + cont = kmalloc(sizeof(struct audit_contid), GFP_ATOMIC);
> > >
> > > If you had to guess, what do you think is going to be more common:
> > > bumping the refcount of an existing entry in the list, or adding a new
> > > entry? I'm asking because I always get a little nervous when doing
> > > allocations while holding a spinlock. Yes, you are doing it with
> > > GFP_ATOMIC, but it still seems like something to try and avoid if this
> > > is going to approach 50%. However, if the new entry is rare then the
> > > extra work of always doing the allocation before taking the lock and
> > > then freeing it afterwards might be a bad tradeoff.
> > >
> > I think this is another way of asking, will multiple processes exist in the same
> > network namespace? That is to say, will we expect a process to create a net
> > namespace, and then have other processes enter that namespace (thereby
> > triggering multiple calls to audit_netns_contid_add with the same net pointer
> > and cont id). Given that the kubernetes notion of a pod is almost by definition
> > multiple containers sharing a network namespace, I think the answer is that we
> > will be strongly biased towards the refcount_inc case, rather than the kmalloc
> > case. I could be wrong, but I think this is likely already in the optimized
> > order.
>
> I had a stab at doing a GFP_KERNEL alloc before the spinlock and releasing it
> after if it wasn't needed (in Patch#1 below). I also went one step further and
> converted it to kmem_cache (in Patch#2 below).
>
> > > My gut feeling says we might do about as many allocations as refcount
> > > bumps, but I could be thinking about this wrong.
> > >
> > > Moving the allocation outside the spinlock might also open the door to
> > > doing this as GFP_KERNEL, which is a good thing, but I haven't looked
> > > at the callers to see if that is possible (it may not be). That's an
> > > exercise left to the patch author (if he hasn't done that already).
>
> Both appear to work, but after successfully running both the contid test and
> audit_testsuite, once I start to push that test system a bit harder I end up
> with a deadlock warning.
>
> I am having trouble understanding why since it happens both without and with
> the kmem_cache options, so it must be another part of the code that is
> triggering it. The task_lock is being held at this point in
> audit_set_contid(). I haven't tried changing this alloc over to a GFP_ATOMIC
> to see if that prevents it, just as a debugging check...
> At this point, I'm inclined to leave it as it was without these two patches
> since it works and there doesn't seem to be an obvious best way given the
> uncertainty of the potential workloads.
I'm definitely not a mm expert, but I wonder if you might be able to
get this working using GFP_NOFS, but I see just now that they
recommend you *not* use GFP_NOFS; even if this did solve the problem,
it's probably not the right approach.
I'm okay with leaving it as-is with GFP_ATOMIC. My original response
to this was more of a question about optimizing for a given use case,
having something that works is far more important.
--
paul moore
www.paul-moore.com
next prev parent reply other threads:[~2019-04-05 2:06 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-15 18:29 [PATCH ghak90 V5 00/10] audit: implement container identifier Richard Guy Briggs
2019-03-15 18:29 ` [PATCH ghak90 V5 01/10] audit: collect audit task parameters Richard Guy Briggs
2019-03-16 19:57 ` Neil Horman
2019-03-27 20:33 ` Ondrej Mosnacek
2019-03-15 18:29 ` [PATCH ghak90 V5 02/10] audit: add container id Richard Guy Briggs
2019-03-16 20:00 ` Neil Horman
2019-03-27 20:38 ` Ondrej Mosnacek
2019-03-27 20:44 ` Richard Guy Briggs
2019-03-15 18:29 ` [PATCH ghak90 V5 03/10] audit: read container ID of a process Richard Guy Briggs
2019-03-18 11:10 ` Neil Horman
2019-03-18 18:17 ` Richard Guy Briggs
2019-03-18 18:48 ` Neil Horman
2019-03-18 18:54 ` Richard Guy Briggs
2019-03-27 20:44 ` Ondrej Mosnacek
2019-03-15 18:29 ` [PATCH ghak90 V5 04/10] audit: log container info of syscalls Richard Guy Briggs
2019-03-16 22:44 ` Neil Horman
2019-03-27 21:01 ` Ondrej Mosnacek
2019-03-27 22:10 ` Richard Guy Briggs
2019-03-15 18:29 ` [PATCH ghak90 V5 05/10] audit: add containerid support for ptrace and signals Richard Guy Briggs
2019-03-18 19:04 ` Neil Horman
2019-03-18 19:29 ` Richard Guy Briggs
2019-03-27 21:17 ` Ondrej Mosnacek
2019-03-28 2:04 ` Richard Guy Briggs
2019-03-30 12:55 ` Richard Guy Briggs
2019-03-15 18:29 ` [PATCH ghak90 V5 06/10] audit: add support for non-syscall auxiliary records Richard Guy Briggs
2019-03-18 19:34 ` Neil Horman
2019-03-27 21:22 ` Ondrej Mosnacek
2019-04-01 14:49 ` Paul Moore
2019-04-01 17:44 ` Richard Guy Briggs
2019-04-01 18:57 ` Paul Moore
2019-04-01 20:43 ` Richard Guy Briggs
2019-03-15 18:29 ` [PATCH ghak90 V5 07/10] audit: add containerid support for user records Richard Guy Briggs
2019-03-18 19:41 ` Neil Horman
2019-03-27 21:30 ` Ondrej Mosnacek
2019-03-15 18:29 ` [PATCH ghak90 V5 08/10] audit: add containerid filtering Richard Guy Briggs
2019-03-18 20:02 ` Ondrej Mosnacek
2019-03-18 23:47 ` Richard Guy Briggs
2019-03-27 21:41 ` Ondrej Mosnacek
2019-03-27 22:00 ` Richard Guy Briggs
2019-03-18 20:39 ` Neil Horman
2019-03-15 18:29 ` [PATCH ghak90 V5 09/10] audit: add support for containerid to network namespaces Richard Guy Briggs
2019-03-18 20:56 ` Neil Horman
2019-03-27 22:42 ` Ondrej Mosnacek
2019-03-28 1:12 ` Richard Guy Briggs
2019-03-28 8:01 ` Ondrej Mosnacek
2019-03-28 15:46 ` Paul Moore
2019-03-28 21:40 ` Richard Guy Briggs
2019-03-28 22:00 ` Paul Moore
2019-03-31 2:11 ` Neil Horman
2019-03-29 14:50 ` Neil Horman
2019-03-29 14:49 ` Neil Horman
2019-04-01 14:50 ` Paul Moore
2019-04-01 20:41 ` Richard Guy Briggs
2019-04-02 11:31 ` Neil Horman
2019-04-02 13:31 ` Paul Moore
2019-04-02 14:28 ` Neil Horman
2019-04-04 21:40 ` Richard Guy Briggs
2019-04-05 2:06 ` Paul Moore [this message]
2019-04-05 11:32 ` Neil Horman
2019-03-15 18:29 ` [PATCH ghak90 V5 10/10] audit: NETFILTER_PKT: record each container ID associated with a netNS Richard Guy Briggs
2019-03-18 20:58 ` Neil Horman
2019-03-27 22:52 ` Ondrej Mosnacek
2019-04-01 14:50 ` Paul Moore
2019-04-01 17:50 ` Richard Guy Briggs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHC9VhSf_7RH96dNhVhifs3uqEZf0Uq=Qi5TgqdVtUskiqYQ4Q@mail.gmail.com' \
--to=paul@paul-moore.com \
--cc=containers@lists.linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=ebiederm@xmission.com \
--cc=eparis@parisplace.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-audit@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=rgb@redhat.com \
--cc=serge@hallyn.com \
--cc=simo@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).