linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Casey Schaufler <casey@schaufler-ca.com>
To: David Howells <dhowells@redhat.com>, viro@zeniv.linux.org.uk
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	raven@themaw.net, keyrings@vger.kernel.org,
	linux-security-module@vger.kernel.org
Subject: Re: [RFC][PATCH 0/5] Mount, Filesystem and Keyrings notifications
Date: Mon, 23 Jul 2018 09:31:27 -0700	[thread overview]
Message-ID: <675e5c24-36ef-4cc5-846c-1414c1195d85@schaufler-ca.com> (raw)
In-Reply-To: <153235954191.32640.5792167066538704794.stgit@warthog.procyon.org.uk>

On 7/23/2018 8:25 AM, David Howells wrote:
> Hi Al,
>
> Here's a set of patches to add a general variable-length notification queue
> concept and to add sources of events for:

Overall I approve. The interface is a bit clunky. Some concerns below.

>
>  (1) Mount topology and reconfiguration change events.

With the possibility of unprivileged mounting you're
going to have to address access control on events.
If root in a user namespace mounts a filesystem you
may have a case where the "real" user wouldn't want the
listener to receive a notification.

>  (2) Superblocks EIO, ENOSPC and EDQUOT events (not complete yet).

Here, too. If SELinux (for example) policy says you can't see
anything on a filesystem you shouldn't get notifications about
things that happen to that filesystem.

>  (3) Key/keyring changes events

And again, I should only get notifications about keys and
keyrings I have access to.

I expect that you intentionally left off

   (4) User injected events

at this point, but it's an obvious extension. That is going
to require access controls (remember kdbus) so I think you'd
do well to design them in now rather than have some security
module hack like me come along later and "fix" it. 

> One of the reasons for this is so that we can remove the issue of processes
> having to repeatedly and regularly scan /proc/mounts, which has proven to be a
> system performance problem.
>
>
> Design decisions:
>
>  (1) A misc chardev is used to create and open a ring buffer:
>
> 	fd = open("/dev/watch_queue", O_RDWR);
>
>      which is then configured and mmap'd into userspace:
>
> 	ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);
> 	ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter);
> 	buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE,
> 		   MAP_SHARED, fd, 0);
>
>      The fd cannot be read or written (though there is a facility to use write
>      to inject records for debugging) and userspace just pulls data directly
>      out of the buffer.
>
>  (2) The ring index pointers are stored inside the ring and are thus
>      accessible to userspace.  Userspace should only update the tail pointer
>      and never the head pointer or risk breaking the buffer.  The kernel
>      checks that the pointers appear valid before trying to use them.  A
>      'skip' record is maintained around the pointers.
>
>  (3) poll() can be used to wait for data to appear in the buffer.
>
>  (4) Records in the buffer are binary, typed and have a length so that they
>      can be of varying size.
>
>      This means that multiple heterogeneous sources can share a common
>      buffer.  Tags may be specified when a watchpoint is created to help
>      distinguish the sources.
>
>  (5) The queue is reusable as there are 16 million types available, of which
>      I've used 4, so there is scope for others to be used.
>
>  (6) Records are filterable as types have up to 256 subtypes that can be
>      individually filtered.  Other filtration is also available.
>
>  (7) Each time the buffer is opened, a new buffer is created - this means that
>      there's no interference between watchers.
>
>  (8) When recording a notification, the kernel will not sleep, but will rather
>      mark a queue as overrun if there's insufficient space, thereby avoiding
>      userspace causing the kernel to hang.
>
>  (9) The 'watchpoint' should be specific where possible, meaning that you
>      specify the object that you want to watch.
>
> (10) The buffer is created and then watchpoints are attached to it, using one
>      of:
>
> 	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01);
> 	mount_notify(AT_FDCWD, "/", 0, fd, 0x02);
> 	sb_notify(AT_FDCWD, "/mnt", 0, fd, 0x03);
>
>      where in all three cases, fd indicates the queue and the number after is
>      a tag between 0 and 255.
>
> (11) The watch must be removed if either the watch buffer is destroyed or the
>      watched object is destroyed.
>
>
> Things I want to avoid:
>
>  (1) Introducing features that make the core VFS dependent on the network
>      stack or networking namespaces (ie. usage of netlink).
>
>  (2) Dumping all this stuff into dmesg and having a daemon that sits there
>      parsing the output and distributing it as this then puts the
>      responsibility for security into userspace and makes handling namespaces
>      tricky.  Further, dmesg might not exist or might be inaccessible inside a
>      container.
>
>  (3) Letting users see events they shouldn't be able to see.
>
>
> Further things that need to be done:
>
>  (1) fsinfo() syscall needs to find superblocks by ID as well as by path so
>      that it can query a superblock for information without the need to try
>      and work out how to reach it - if the calling process even can.
>
>  (2) A mount_info() syscall is needed that can enumerate all the children of a
>      mount.  This is necessary because mountpoints can hide each other by
>      stacking, so paths are not unique keys.  This will require the ability to
>      look up a mount by ID.  This avoids the need to parse /proc/mounts.
>
>  (3) A keyctl call is needed to allow a watch on a keyring to be extended to
>      "children" of that keyring, such that the watch is removed from the child
>      if it is unlinked from the keyring.
>
>  (4) A global superblock event queue maybe?
>
>  (5) Propagating watches to child superblock over automounts?
>
>
> The patches can be found here also:
>
> 	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications
>
> David
> ---
> David Howells (5):
>       General notification queue with user mmap()'able ring buffer
>       KEYS: Add a notification facility
>       vfs: Add a mount-notification facility
>       vfs: Add superblock notifications
>       Add sample notification program
>
>
>  Documentation/security/keys/core.rst   |   59 ++
>  Documentation/watch_queue.rst          |  305 ++++++++++++
>  arch/x86/entry/syscalls/syscall_32.tbl |    2 
>  arch/x86/entry/syscalls/syscall_64.tbl |    2 
>  drivers/misc/Kconfig                   |    9 
>  drivers/misc/Makefile                  |    1 
>  drivers/misc/watch_queue.c             |  835 ++++++++++++++++++++++++++++++++
>  fs/Kconfig                             |   21 +
>  fs/Makefile                            |    1 
>  fs/fs_context.c                        |    1 
>  fs/mount.h                             |   26 +
>  fs/mount_notify.c                      |  178 +++++++
>  fs/namespace.c                         |   18 +
>  fs/super.c                             |  116 ++++
>  include/linux/dcache.h                 |    1 
>  include/linux/fs.h                     |   77 +++
>  include/linux/key.h                    |    4 
>  include/linux/syscalls.h               |    4 
>  include/linux/watch_queue.h            |   87 +++
>  include/uapi/linux/keyctl.h            |    1 
>  include/uapi/linux/watch_queue.h       |  156 ++++++
>  kernel/sys_ni.c                        |    6 
>  mm/interval_tree.c                     |    2 
>  mm/memory.c                            |    1 
>  samples/Kconfig                        |    6 
>  samples/Makefile                       |    2 
>  samples/watch_queue/Makefile           |    9 
>  samples/watch_queue/watch_test.c       |  232 +++++++++
>  security/keys/Kconfig                  |   10 
>  security/keys/compat.c                 |    3 
>  security/keys/gc.c                     |    5 
>  security/keys/internal.h               |   29 +
>  security/keys/key.c                    |   37 +
>  security/keys/keyctl.c                 |   90 +++
>  security/keys/keyring.c                |   17 -
>  security/keys/request_key.c            |    4 
>  36 files changed, 2332 insertions(+), 25 deletions(-)
>  create mode 100644 Documentation/watch_queue.rst
>  create mode 100644 drivers/misc/watch_queue.c
>  create mode 100644 fs/mount_notify.c
>  create mode 100644 include/linux/watch_queue.h
>  create mode 100644 include/uapi/linux/watch_queue.h
>  create mode 100644 samples/watch_queue/Makefile
>  create mode 100644 samples/watch_queue/watch_test.c
>
>

  parent reply	other threads:[~2018-07-23 17:33 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-23 15:25 [RFC][PATCH 0/5] Mount, Filesystem and Keyrings notifications David Howells
2018-07-23 15:25 ` [PATCH 1/5] General notification queue with user mmap()'able ring buffer David Howells
2018-07-23 15:25 ` [PATCH 2/5] KEYS: Add a notification facility David Howells
2018-07-23 15:26 ` [PATCH 3/5] vfs: Add a mount-notification facility David Howells
2018-07-23 15:26 ` [PATCH 4/5] vfs: Add superblock notifications David Howells
2018-07-23 15:26 ` [PATCH 5/5] Add sample notification program David Howells
2018-07-23 16:31 ` Casey Schaufler [this message]
2018-07-24  0:37   ` [RFC][PATCH 0/5] Mount, Filesystem and Keyrings notifications Ian Kent
2018-07-24 16:00 ` David Howells
2018-07-24 18:57   ` Casey Schaufler
2018-07-25  5:39     ` Ian Kent
2018-07-25 15:48       ` Casey Schaufler
2018-07-26  1:18         ` Ian Kent
2018-07-26 16:09           ` Casey Schaufler
2018-07-24 19:22   ` David Howells
2018-08-01 21:04 ` LSM hook for mount, superblock and keys watch notifications David Howells
2018-08-01 21:49   ` Casey Schaufler
2018-08-01 22:50   ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=675e5c24-36ef-4cc5-846c-1414c1195d85@schaufler-ca.com \
    --to=casey@schaufler-ca.com \
    --cc=dhowells@redhat.com \
    --cc=keyrings@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=raven@themaw.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).