From: Richard Guy Briggs <rgb@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: containers@lists.linux-foundation.org,
linux-kernel@vger.kernel.org, pmoore@redhat.com,
linux-audit@redhat.com, eparis@parisplace.org, sgrubb@redhat.com,
zohar@linux.vnet.ibm.com
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task
Date: Mon, 27 Apr 2015 22:05:55 -0400 [thread overview]
Message-ID: <20150428020555.GB20713@madcap2.tricolour.ca> (raw)
In-Reply-To: <87bnid9v4f.fsf@x220.int.ebiederm.org>
On 15/04/24, Eric W. Biederman wrote:
> Richard Guy Briggs <rgb@redhat.com> writes:
> > On 15/04/22, Richard Guy Briggs wrote:
> >> On 15/04/20, Eric W. Biederman wrote:
> >> > Richard Guy Briggs <rgb@redhat.com> writes:
> >> >
> >> > > The purpose is to track namespace instances in use by logged processes from the
> >> > > perspective of init_*_ns by logging the namespace IDs (device ID and namespace
> >> > > inode - offset).
> >> >
> >> > In broad strokes the user interface appears correct.
> >> >
> >> > Things that I see that concern me:
> >> >
> >> > - After Als most recent changes these inodes no longer live in the proc
> >> > superblock so the device number reported in these patches is
> >> > incorrect.
> >>
> >> Ok, found the patchset you're talking about:
> >> 3d3d35b kill proc_ns completely
> >> e149ed2 take the targets of /proc/*/ns/* symlinks to separate fs
> >> f77c801 bury struct proc_ns in fs/proc
> >> 33c4294 copy address of proc_ns_ops into ns_common
> >> 6344c43 new helpers: ns_alloc_inum/ns_free_inum
> >> 6496452 make proc_ns_operations work with struct ns_common * instead of void *
> >> 3c04118 switch the rest of proc_ns_operations to working with &...->ns
> >> ff24870 netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
> >> 58be2825 make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
> >> 435d5f4 common object embedded into various struct ....ns
> >>
> >> Ok, I've got some minor jigging to do to get inum too...
> >
> > Do I even need to report the device number anymore since I am concluding
> > s_dev is never set (or always zero) in the nsfs filesystem by
> > mount_pseudo() and isn't even mountable?
>
> We still need the dev. We do have a device number get_anon_bdev fills it in.
Fine, it has a device number. There appears to be only one of these
allocated per kernel. I can get it from &nsfs->fs_supers (and take the
first instance given by hlist_for_each_entry and verify there are no
others). Why do I need it, again?
> > In fact, I never needed to
> > report the device since proc ida/idr and inodes are kernel-global and
> > namespace-oblivious.
>
> This is the bit I really want to keep to be forward looking. If we
> every need to preserve the inode numbers across a migration we could
> have different super blocks with different inode numbers for the same
> namespace.
I don't quite follow your argument here, but can accept that in the
future we might add other namespace devices. I wonder if we might do
that augmentation later and leave out the device number for now...
> >> > - I am nervous about audit logs being flooded with users creating lots
> >> > of namespaces. But that is more your lookout than mine.
> >>
> >> There was a thought to create a filter to en/disable this logging...
> >> It is an auxiliary record to syscalls, so they can be ignored by userspace tools.
> >>
> >> > - unshare is not logging when it creates new namespaces.
> >>
> >> They are all covered:
> >> sys_unshare > unshare_userns > create_user_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_mnt_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_utsname > clone_uts_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_ipcs > get_ipc_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_pid_ns > create_pid_namespace
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_net_ns
>
> Then why the special change to fork? That was not reflected on
> the unshare path as far as I could see.
Fork can specify more than one CLONE flag at once, so collecting them
all in one statementn seemed helpful. setns can only set one at a time.
> >> > As small numbers are nice and these inodes all live in their own
> >> > superblock now we should be able to remove the games with
> >> > PROC_DYNAMIC_FIRST and just use small numbers for these inodes
> >> > everywhere.
> >>
> >> That is compelling if I can untangle the proc inode allocation code from the
> >> ida/idr. Should be as easy as defining a new ns_alloc_inum (and ns_free_inum)
> >> to use instead of proc_alloc_inum with its own ns_inum_ida and ns_inum_lock,
> >> then defining a NS_DYNAMIC_FIRST and defining NS_{IPC,UTS,USER,PID}_INIT_INO in
> >> the place of the existing PROC_*_INIT_INO.
>
> Something like that. Just a new ida/idr allocator specific to that
> superblock.
>
> Yeah. It is somewhere on my todo, but I have been prioritizing getting
> the bugs that look potentially expoloitable fixed in the mount
> namespace. Al made things nice for one case but left a mess for a bunch
> of others.
>
> >> > I honestly don't know how much we are going to care about namespace ids
> >> > during migration. So far this is not a problem that has come up.
> >>
> >> Not for CRIU, but it will be an issue for a container auditor that aggregates
> >> information from individually auditted hosts.
> >>
> >> > I don't think migration becomes a practical concern (other than
> >> > interface wise) until achieve a non-init namespace auditd. The easy way
> >> > to handle migration would be to log a setns of every process from their
> >> > old namespaces to their new namespaces. As you appear to have a setns
> >> > event defined.
> >>
> >> Again, this would be taken care of by a layer above that is container-aware
> >> across multiple hosts.
>
>
> >> > How to handle the more general case beyond audit remains unclear. I
> >> > think it will be a little while yet before we start dealing with
> >> > migrating applications that care. When we do we will either need to
> >> > generate some kind of hot-plug event that userspace can respond to and
> >> > discover all of the appropriate file-system nodes have changed, or we
> >> > will need to build a mechanism in the kernel to preserve these numbers.
> >>
> >> I don't expect to need to preserve these numbers. The higher layer application
> >> will be able to do that translation.
>
> We need to be very aware of what is happening.
>
> The situation I am concerned about looks something like.
>
> Program A:
> fd1 = open(/proc/self/ns/net);
> fstat(fd1, &stat1)
>
> ... later ...
>
> fd2 = open(/var/run/netns/johnny);
> fstat(fd2, &stat2);
>
> if ((stat1.st_dev == stat2.st_dev) &&
> (stat1.st_ino == stat2.st_ino)) {
> /* Same netns do something... */
> }
>
>
> What happens when we migrate Program A with it's cached stat data of
> of a network namespace file?
>
> This requires either a hotplug event that Program A listens to or that
> the inode number and device number are preserved across migration.
>
> Exactly what we do depends on where we are when it comes up. But this
> is not something some layer about the program can abstract it all out so
> we don't need to worry about it.
Ok, understood, we can't just punt this one to a higher layer...
So this comes back to a question above, which is how do we determine
which device it is from? Sounds like we need something added to
ns_common or one of the 6 namespace types structs.
> Eric
- RGB
--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
next prev parent reply other threads:[~2015-04-28 2:06 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-17 7:35 [PATCH V6 00/10] namespaces: log namespaces per task Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 01/10] namespaces: expose ns_entries Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 02/10] proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 03/10] audit: log namespace ID numbers Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 04/10] audit: initialize at subsystem time rather than device time Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 05/10] audit: log creation and deletion of namespace instances Richard Guy Briggs
2015-05-05 14:22 ` Steve Grubb
2015-05-05 14:31 ` Aristeu Rozanski
2015-05-05 14:46 ` Steve Grubb
2015-05-05 14:56 ` Eric W. Biederman
2015-05-05 15:16 ` Steve Grubb
2015-05-12 19:57 ` Richard Guy Briggs
2015-05-14 14:57 ` Steve Grubb
2015-05-14 15:42 ` Eric W. Biederman
2015-05-14 16:21 ` Steve Grubb
2015-05-15 2:03 ` Richard Guy Briggs
2015-05-14 19:19 ` Paul Moore
2015-05-15 1:31 ` Eric W. Biederman
2015-05-15 2:25 ` Richard Guy Briggs
2015-05-15 13:17 ` Steve Grubb
2015-05-15 14:51 ` Eric W. Biederman
2015-05-15 21:01 ` Paul Moore
2015-05-15 2:32 ` Richard Guy Briggs
2015-05-15 6:23 ` Andy Lutomirski
2015-05-15 12:38 ` Steve Grubb
2015-05-15 13:17 ` Andy Lutomirski
2015-05-15 21:05 ` Paul Moore
2015-05-16 9:46 ` Daniel J Walsh
2015-05-16 12:16 ` Paul Moore
2015-05-16 14:46 ` Eric W. Biederman
2015-05-16 22:49 ` Paul Moore
2015-05-19 13:09 ` Richard Guy Briggs
2015-05-19 14:27 ` Paul Moore
2015-05-15 0:48 ` Richard Guy Briggs
2015-05-15 20:26 ` Paul Moore
[not found] ` <CAA4jN2bgynVTwF+owtXgq06JMLQJpy_qokpD0mAguNYeDxmh1A@mail.gmail.com>
2015-05-15 2:11 ` Richard Guy Briggs
2015-05-15 13:19 ` Daniel J Walsh
2015-05-15 20:42 ` Paul Moore
2015-04-17 7:35 ` [PATCH V6 06/10] audit: dump namespace IDs for pid on receipt of AUDIT_NS_INFO Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 07/10] sched: add a macro to ref all CLONE_NEW* flags Richard Guy Briggs
2015-04-17 8:18 ` Peter Zijlstra
2015-04-17 15:42 ` Richard Guy Briggs
2015-04-17 17:41 ` Peter Zijlstra
2015-04-17 22:00 ` Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 08/10] fork: audit on creation of new namespace(s) Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 09/10] audit: log on switching namespace (setns) Richard Guy Briggs
2015-04-17 7:35 ` [PATCH V6 10/10] audit: emit AUDIT_NS_INFO record with AUDIT_VIRT_CONTROL record Richard Guy Briggs
2015-04-21 4:33 ` [PATCH V6 00/10] namespaces: log namespaces per task Eric W. Biederman
2015-04-23 3:07 ` Richard Guy Briggs
2015-04-23 20:44 ` Richard Guy Briggs
2015-04-24 19:36 ` Eric W. Biederman
2015-04-28 2:05 ` Richard Guy Briggs [this message]
2015-04-28 2:16 ` Eric W. Biederman
2015-05-08 14:42 ` Richard Guy Briggs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150428020555.GB20713@madcap2.tricolour.ca \
--to=rgb@redhat.com \
--cc=containers@lists.linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=eparis@parisplace.org \
--cc=linux-audit@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=pmoore@redhat.com \
--cc=sgrubb@redhat.com \
--cc=zohar@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).