linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Richard Guy Briggs <rgb@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, pmoore@redhat.com,
	linux-audit@redhat.com, eparis@parisplace.org, sgrubb@redhat.com,
	zohar@linux.vnet.ibm.com
Subject: Re: [PATCH V6 00/10] namespaces: log namespaces per task
Date: Mon, 27 Apr 2015 22:05:55 -0400	[thread overview]
Message-ID: <20150428020555.GB20713@madcap2.tricolour.ca> (raw)
In-Reply-To: <87bnid9v4f.fsf@x220.int.ebiederm.org>

On 15/04/24, Eric W. Biederman wrote:
> Richard Guy Briggs <rgb@redhat.com> writes:
> > On 15/04/22, Richard Guy Briggs wrote:
> >> On 15/04/20, Eric W. Biederman wrote:
> >> > Richard Guy Briggs <rgb@redhat.com> writes:
> >> > 
> >> > > The purpose is to track namespace instances in use by logged processes from the
> >> > > perspective of init_*_ns by logging the namespace IDs (device ID and namespace
> >> > > inode - offset).
> >> > 
> >> > In broad strokes the user interface appears correct.
> >> > 
> >> > Things that I see that concern me:
> >> > 
> >> > - After Als most recent changes these inodes no longer live in the proc
> >> >   superblock so the device number reported in these patches is
> >> >   incorrect.
> >> 
> >> Ok, found the patchset you're talking about:
> >> 	3d3d35b kill proc_ns completely
> >> 	e149ed2 take the targets of /proc/*/ns/* symlinks to separate fs
> >> 	f77c801 bury struct proc_ns in fs/proc
> >> 	33c4294 copy address of proc_ns_ops into ns_common
> >> 	6344c43 new helpers: ns_alloc_inum/ns_free_inum
> >> 	6496452 make proc_ns_operations work with struct ns_common * instead of void *
> >> 	3c04118 switch the rest of proc_ns_operations to working with &...->ns
> >> 	ff24870 netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
> >> 	58be2825 make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
> >> 	435d5f4 common object embedded into various struct ....ns
> >> 
> >> Ok, I've got some minor jigging to do to get inum too...
> >
> > Do I even need to report the device number anymore since I am concluding
> > s_dev is never set (or always zero) in the nsfs filesystem by
> > mount_pseudo() and isn't even mountable? 
> 
> We still need the dev. We do have a device number get_anon_bdev fills it in.

Fine, it has a device number.  There appears to be only one of these
allocated per kernel.  I can get it from &nsfs->fs_supers (and take the
first instance given by hlist_for_each_entry and verify there are no
others).  Why do I need it, again?

> > In fact, I never needed to
> > report the device since proc ida/idr and inodes are kernel-global and
> > namespace-oblivious.
> 
> This is the bit I really want to keep to be forward looking.  If we
> every need to preserve the inode numbers across a migration we could
> have different super blocks with different inode numbers for the same
> namespace.

I don't quite follow your argument here, but can accept that in the
future we might add other namespace devices.  I wonder if we might do
that augmentation later and leave out the device number for now...

> >> > - I am nervous about audit logs being flooded with users creating lots
> >> >   of namespaces.  But that is more your lookout than mine.
> >> 
> >> There was a thought to create a filter to en/disable this logging...
> >> It is an auxiliary record to syscalls, so they can be ignored by userspace tools.
> >> 
> >> > - unshare is not logging when it creates new namespaces.
> >> 
> >> They are all covered:
> >> sys_unshare > unshare_userns > create_user_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_mnt_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_utsname > clone_uts_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_ipcs > get_ipc_ns
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_pid_ns > create_pid_namespace
> >> sys_unshare > unshare_nsproxy_namespaces > create_new_namespaces > copy_net_ns
> 
> Then why the special change to fork?  That was not reflected on
> the unshare path as far as I could see.

Fork can specify more than one CLONE flag at once, so collecting them
all in one statementn seemed helpful.  setns can only set one at a time.

> >> > As small numbers are nice and these inodes all live in their own
> >> > superblock now we should be able to remove the games with
> >> > PROC_DYNAMIC_FIRST and just use small numbers for these inodes
> >> > everywhere.
> >> 
> >> That is compelling if I can untangle the proc inode allocation code from the
> >> ida/idr.  Should be as easy as defining a new ns_alloc_inum (and ns_free_inum)
> >> to use instead of proc_alloc_inum with its own ns_inum_ida and ns_inum_lock,
> >> then defining a NS_DYNAMIC_FIRST and defining NS_{IPC,UTS,USER,PID}_INIT_INO in
> >> the place of the existing PROC_*_INIT_INO.
> 
> Something like that.  Just a new ida/idr allocator specific to that
> superblock.
> 
> Yeah.  It is somewhere on my todo, but I have been prioritizing getting
> the bugs that look potentially expoloitable fixed in the mount
> namespace.  Al made things nice for one case but left a mess for a bunch
> of others.
> 
> >> > I honestly don't know how much we are going to care about namespace ids
> >> > during migration.  So far this is not a problem that has come up.
> >> 
> >> Not for CRIU, but it will be an issue for a container auditor that aggregates
> >> information from individually auditted hosts.
> >> 
> >> > I don't think migration becomes a practical concern (other than
> >> > interface wise) until achieve a non-init namespace auditd.  The easy way
> >> > to handle migration would be to log a setns of every process from their
> >> > old namespaces to their new namespaces.  As you appear to have a setns
> >> > event defined.
> >> 
> >> Again, this would be taken care of by a layer above that is container-aware
> >> across multiple hosts.
> 
> 
> >> > How to handle the more general case beyond audit remains unclear.  I
> >> > think it will be a little while yet before we start dealing with
> >> > migrating applications that care.  When we do we will either need to
> >> > generate some kind of hot-plug event that userspace can respond to and
> >> > discover all of the appropriate file-system nodes have changed, or we
> >> > will need to build a mechanism in the kernel to preserve these numbers.
> >> 
> >> I don't expect to need to preserve these numbers.  The higher layer application
> >> will be able to do that translation.
> 
> We need to be very aware of what is happening.
> 
> The situation I am concerned about looks something like.
> 
> Program A:
>   fd1 = open(/proc/self/ns/net);
>   fstat(fd1, &stat1)
> 
>   ... later ...
> 
>   fd2 = open(/var/run/netns/johnny);
>   fstat(fd2, &stat2);
> 
>   if ((stat1.st_dev == stat2.st_dev) &&
>       (stat1.st_ino == stat2.st_ino)) {
> 	/* Same netns do something... */
>   }
> 
> 
> What happens when we migrate Program A with it's cached stat data of
> of a network namespace file?
> 
> This requires either a hotplug event that Program A listens to or that
> the inode number and device number are preserved across migration.
> 
> Exactly what we do depends on where we are when it comes up.  But this
> is not something some layer about the program can abstract it all out so
> we don't need to worry about it.

Ok, understood, we can't just punt this one to a higher layer...

So this comes back to a question above, which is how do we determine
which device it is from?  Sounds like we need something added to
ns_common or one of the 6 namespace types structs.

> Eric

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

  reply	other threads:[~2015-04-28  2:06 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-17  7:35 [PATCH V6 00/10] namespaces: log namespaces per task Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 01/10] namespaces: expose ns_entries Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 02/10] proc_ns: define PROC_*_INIT_INO in terms of PROC_DYNAMIC_FIRST Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 03/10] audit: log namespace ID numbers Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 04/10] audit: initialize at subsystem time rather than device time Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 05/10] audit: log creation and deletion of namespace instances Richard Guy Briggs
2015-05-05 14:22   ` Steve Grubb
2015-05-05 14:31     ` Aristeu Rozanski
2015-05-05 14:46       ` Steve Grubb
2015-05-05 14:56     ` Eric W. Biederman
2015-05-05 15:16       ` Steve Grubb
2015-05-12 19:57     ` Richard Guy Briggs
2015-05-14 14:57       ` Steve Grubb
2015-05-14 15:42         ` Eric W. Biederman
2015-05-14 16:21           ` Steve Grubb
2015-05-15  2:03           ` Richard Guy Briggs
2015-05-14 19:19         ` Paul Moore
2015-05-15  1:31           ` Eric W. Biederman
2015-05-15  2:25             ` Richard Guy Briggs
2015-05-15 13:17             ` Steve Grubb
2015-05-15 14:51               ` Eric W. Biederman
2015-05-15 21:01             ` Paul Moore
2015-05-15  2:32           ` Richard Guy Briggs
2015-05-15  6:23             ` Andy Lutomirski
2015-05-15 12:38               ` Steve Grubb
2015-05-15 13:17                 ` Andy Lutomirski
2015-05-15 21:05               ` Paul Moore
2015-05-16  9:46                 ` Daniel J Walsh
2015-05-16 12:16                   ` Paul Moore
2015-05-16 14:46                     ` Eric W. Biederman
2015-05-16 22:49                       ` Paul Moore
2015-05-19 13:09                         ` Richard Guy Briggs
2015-05-19 14:27                           ` Paul Moore
2015-05-15  0:48         ` Richard Guy Briggs
2015-05-15 20:26           ` Paul Moore
     [not found]           ` <CAA4jN2bgynVTwF+owtXgq06JMLQJpy_qokpD0mAguNYeDxmh1A@mail.gmail.com>
2015-05-15  2:11             ` Richard Guy Briggs
2015-05-15 13:19               ` Daniel J Walsh
2015-05-15 20:42             ` Paul Moore
2015-04-17  7:35 ` [PATCH V6 06/10] audit: dump namespace IDs for pid on receipt of AUDIT_NS_INFO Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 07/10] sched: add a macro to ref all CLONE_NEW* flags Richard Guy Briggs
2015-04-17  8:18   ` Peter Zijlstra
2015-04-17 15:42     ` Richard Guy Briggs
2015-04-17 17:41       ` Peter Zijlstra
2015-04-17 22:00         ` Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 08/10] fork: audit on creation of new namespace(s) Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 09/10] audit: log on switching namespace (setns) Richard Guy Briggs
2015-04-17  7:35 ` [PATCH V6 10/10] audit: emit AUDIT_NS_INFO record with AUDIT_VIRT_CONTROL record Richard Guy Briggs
2015-04-21  4:33 ` [PATCH V6 00/10] namespaces: log namespaces per task Eric W. Biederman
2015-04-23  3:07   ` Richard Guy Briggs
2015-04-23 20:44     ` Richard Guy Briggs
2015-04-24 19:36       ` Eric W. Biederman
2015-04-28  2:05         ` Richard Guy Briggs [this message]
2015-04-28  2:16           ` Eric W. Biederman
2015-05-08 14:42             ` Richard Guy Briggs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150428020555.GB20713@madcap2.tricolour.ca \
    --to=rgb@redhat.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=eparis@parisplace.org \
    --cc=linux-audit@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pmoore@redhat.com \
    --cc=sgrubb@redhat.com \
    --cc=zohar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).