From: Christian Brauner <christian.brauner@ubuntu.com>
To: David Howells <dhowells@redhat.com>
Cc: torvalds@linux-foundation.org, viro@zeniv.linux.org.uk,
dray@redhat.com, kzak@redhat.com, mszeredi@redhat.com,
swhiteho@redhat.com, jlayton@redhat.com, raven@themaw.net,
andres@anarazel.de, jarkko.sakkinen@linux.intel.com,
keyrings@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [GIT PULL] General notification queue and key notifications
Date: Wed, 10 Jun 2020 09:56:02 +0000 [thread overview]
Message-ID: <20200610095602.hjzyvehx5vkasavt@wittgenstein> (raw)
In-Reply-To: <1503686.1591113304@warthog.procyon.org.uk>
On Tue, Jun 02, 2020 at 04:55:04PM +0100, David Howells wrote:
> Date: Tue, 02 Jun 2020 16:51:44 +0100
>
> Hi Linus,
>
> Can you pull this, please? It adds a general notification queue concept
> and adds an event source for keys/keyrings, such as linking and unlinking
> keys and changing their attributes.
>
> Thanks to Debarshi Ray, we do have a pull request to use this to fix a
> problem with gnome-online-accounts - as mentioned last time:
>
> https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47
>
> Without this, g-o-a has to constantly poll a keyring-based kerberos cache
> to find out if kinit has changed anything.
>
> [[ With regard to the mount/sb notifications and fsinfo(), Karel Zak and
The mount/sb notification and fsinfo() stuff is something we'd like to
use. (And then later extend to allow for supervised mounts where a
container manager can supervise the mounts of an unprivileged
container.)
I'm not sure if the mount notifications are already part of this pr.
Christian
> Ian Kent have been working on making libmount use them, preparatory to
> working on systemd:
>
> https://github.com/karelzak/util-linux/commits/topic/fsinfo
> https://github.com/raven-au/util-linux/commits/topic/fsinfo.public
>
> Development has stalled briefly due to other commitments, so I'm not
> sure I can ask you to pull those parts of the series for now. Christian
> Brauner would like to use them in lxc, but hasn't started.
> ]]
>
>
> LSM hooks are included:
>
> (1) A set of hooks are provided that allow an LSM to rule on whether or
> not a watch may be set. Each of these hooks takes a different
> "watched object" parameter, so they're not really shareable. The LSM
> should use current's credentials. [Wanted by SELinux & Smack]
>
> (2) A hook is provided to allow an LSM to rule on whether or not a
> particular message may be posted to a particular queue. This is given
> the credentials from the event generator (which may be the system) and
> the watch setter. [Wanted by Smack]
>
> I've provided SELinux and Smack with implementations of some of these hooks.
>
>
> WHY
> =>
> Key/keyring notifications are desirable because if you have your kerberos
> tickets in a file/directory, your Gnome desktop will monitor that using
> something like fanotify and tell you if your credentials cache changes.
>
> However, we also have the ability to cache your kerberos tickets in the
> session, user or persistent keyring so that it isn't left around on disk
> across a reboot or logout. Keyrings, however, cannot currently be
> monitored asynchronously, so the desktop has to poll for it - not so good
> on a laptop. This facility will allow the desktop to avoid the need to
> poll.
>
>
> DESIGN DECISIONS
> ========
>
> (1) The notification queue is built on top of a standard pipe. Messages
> are effectively spliced in. The pipe is opened with a special flag:
>
> pipe2(fds, O_NOTIFICATION_PIPE);
>
> The special flag has the same value as O_EXCL (which doesn't seem like
> it will ever be applicable in this context)[?]. It is given up front
> to make it a lot easier to prohibit splice and co. from accessing the
> pipe.
>
> [?] Should this be done some other way? I'd rather not use up a new
> O_* flag if I can avoid it - should I add a pipe3() system call
> instead?
>
> The pipe is then configured::
>
> ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
> ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
>
> Messages are then read out of the pipe using read().
>
> (2) It should be possible to allow write() to insert data into the
> notification pipes too, but this is currently disabled as the kernel
> has to be able to insert messages into the pipe *without* holding
> pipe->mutex and the code to make this work needs careful auditing.
>
> (3) sendfile(), splice() and vmsplice() are disabled on notification pipes
> because of the pipe->mutex issue and also because they sometimes want
> to revert what they just did - but one or more notification messages
> might've been interleaved in the ring.
>
> (4) The kernel inserts messages with the wait queue spinlock held. This
> means that pipe_read() and pipe_write() have to take the spinlock to
> update the queue pointers.
>
> (5) Records in the buffer are binary, typed and have a length so that they
> can be of varying size.
>
> This allows multiple heterogeneous sources to share a common buffer;
> there are 16 million types available, of which I've used just a few,
> so there is scope for others to be used. Tags may be specified when a
> watchpoint is created to help distinguish the sources.
>
> (6) Records are filterable as types have up to 256 subtypes that can be
> individually filtered. Other filtration is also available.
>
> (7) Notification pipes don't interfere with each other; each may be bound
> to a different set of watches. Any particular notification will be
> copied to all the queues that are currently watching for it - and only
> those that are watching for it.
>
> (8) When recording a notification, the kernel will not sleep, but will
> rather mark a queue as having lost a message if there's insufficient
> space. read() will fabricate a loss notification message at an
> appropriate point later.
>
> (9) The notification pipe is created and then watchpoints are attached to
> it, using one of:
>
> keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
> watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
> watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);
>
> where in both cases, fd indicates the queue and the number after is a
> tag between 0 and 255.
>
> (10) Watches are removed if either the notification pipe is destroyed or
> the watched object is destroyed. In the latter case, a message will
> be generated indicating the enforced watch removal.
>
>
> Things I want to avoid:
>
> (1) Introducing features that make the core VFS dependent on the network
> stack or networking namespaces (ie. usage of netlink).
>
> (2) Dumping all this stuff into dmesg and having a daemon that sits there
> parsing the output and distributing it as this then puts the
> responsibility for security into userspace and makes handling
> namespaces tricky. Further, dmesg might not exist or might be
> inaccessible inside a container.
>
> (3) Letting users see events they shouldn't be able to see.
>
>
> TESTING AND MANPAGES
> ==========
>
> (*) The keyutils tree has a pipe-watch branch that has keyctl commands for
> making use of notifications. Proposed manual pages can also be found
> on this branch, though a couple of them really need to go to the main
> manpages repository instead.
>
> If the kernel supports the watching of keys, then running "make test"
> on that branch will cause the testing infrastructure to spawn a
> monitoring process on the side that monitors a notifications pipe for
> all the key/keyring changes induced by the tests and they'll all be
> checked off to make sure they happened.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch
>
> (*) A test program is provided (samples/watch_queue/watch_test) that can
> be used to monitor for keyrings, mount and superblock events.
> Information on the notifications is simply logged to stdout.
>
> Thanks,
> David
> ---
> The following changes since commit b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce:
>
> Linux 5.7-rc6 (2020-05-17 16:48:37 -0700)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-20200601
>
> for you to fetch changes up to a8478a602913dc89a7cd2060e613edecd07e1dbd:
>
> smack: Implement the watch_key and post_notification hooks (2020-05-19 15:47:38 +0100)
>
> ----------------------------------------------------------------
> Notifications over pipes + Keyring notifications
>
> ----------------------------------------------------------------
> David Howells (12):
> uapi: General notification queue definitions
> security: Add a hook for the point of notification insertion
> pipe: Add O_NOTIFICATION_PIPE
> pipe: Add general notification queue support
> security: Add hooks to rule on setting a watch
> watch_queue: Add a key/keyring notification facility
> Add sample notification program
> pipe: Allow buffers to be marked read-whole-or-error for notifications
> pipe: Add notification lossage handling
> keys: Make the KEY_NEED_* perms an enum rather than a mask
> selinux: Implement the watch_key security hook
> smack: Implement the watch_key and post_notification hooks
>
> Documentation/security/keys/core.rst | 57 ++
> Documentation/userspace-api/ioctl/ioctl-number.rst | 1 +
> Documentation/watch_queue.rst | 339 +++++++++++
> fs/pipe.c | 242 +++++---
> fs/splice.c | 12 +-
> include/linux/key.h | 33 +-
> include/linux/lsm_audit.h | 1 +
> include/linux/lsm_hook_defs.h | 9 +
> include/linux/lsm_hooks.h | 14 +
> include/linux/pipe_fs_i.h | 27 +-
> include/linux/security.h | 30 +-
> include/linux/watch_queue.h | 127 ++++
> include/uapi/linux/keyctl.h | 2 +
> include/uapi/linux/watch_queue.h | 104 ++++
> init/Kconfig | 12 +
> kernel/Makefile | 1 +
> kernel/watch_queue.c | 659 +++++++++++++++++++++
> samples/Kconfig | 6 +
> samples/Makefile | 1 +
> samples/watch_queue/Makefile | 7 +
> samples/watch_queue/watch_test.c | 186 ++++++
> security/keys/Kconfig | 9 +
> security/keys/compat.c | 3 +
> security/keys/gc.c | 5 +
> security/keys/internal.h | 38 +-
> security/keys/key.c | 38 +-
> security/keys/keyctl.c | 115 +++-
> security/keys/keyring.c | 20 +-
> security/keys/permission.c | 31 +-
> security/keys/process_keys.c | 46 +-
> security/keys/request_key.c | 4 +-
> security/security.c | 22 +-
> security/selinux/hooks.c | 51 +-
> security/smack/smack_lsm.c | 112 +++-
> 34 files changed, 2185 insertions(+), 179 deletions(-)
> create mode 100644 Documentation/watch_queue.rst
> create mode 100644 include/linux/watch_queue.h
> create mode 100644 include/uapi/linux/watch_queue.h
> create mode 100644 kernel/watch_queue.c
> create mode 100644 samples/watch_queue/Makefile
> create mode 100644 samples/watch_queue/watch_test.c
>
next prev parent reply other threads:[~2020-06-10 9:56 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-30 13:58 Upcoming: Notifications, FS notifications and fsinfo() David Howells
2020-03-30 14:31 ` [GIT PULL] General notification queue and key notifications David Howells
2020-03-31 6:51 ` Stephen Rothwell
2020-06-02 15:55 ` David Howells
2020-06-03 2:15 ` Ian Kent
2020-06-08 0:49 ` Ian Kent
2020-06-10 9:56 ` Christian Brauner [this message]
2020-06-10 11:12 ` Karel Zak
2020-06-12 21:32 ` Linus Torvalds
2020-06-12 22:01 ` Linus Torvalds
2020-06-13 13:04 ` David Howells
2020-06-13 16:47 ` Linus Torvalds
2020-06-13 17:03 ` Linus Torvalds
2020-06-13 19:22 ` Miklos Szeredi
2020-06-13 13:24 ` David Howells
2020-06-13 18:00 ` pr-tracker-bot
2020-06-17 1:15 ` Williams, Dan J
2020-06-23 23:38 ` Dan Williams
2020-06-24 0:55 ` David Howells
2020-06-24 1:03 ` Dan Williams
2020-06-24 1:17 ` David Howells
2020-03-30 14:36 ` [GIT PULL] Mount and superblock notifications David Howells
2020-04-04 21:13 ` Linus Torvalds
2020-04-05 22:52 ` Andres Freund
2020-03-30 14:43 ` [GIT PULL] fsinfo: Filesystem information query David Howells
2020-03-30 20:28 ` Upcoming: Notifications, FS notifications and fsinfo() Miklos Szeredi
2020-03-31 9:21 ` Karel Zak
2020-03-30 21:17 ` Christian Brauner
2020-03-31 5:11 ` Miklos Szeredi
2020-03-31 8:15 ` Christian Brauner
2020-03-31 8:34 ` Miklos Szeredi
2020-03-31 8:34 ` Karel Zak
2020-03-31 8:56 ` Miklos Szeredi
2020-03-31 9:49 ` Karel Zak
2020-03-31 12:25 ` Lennart Poettering
2020-03-31 15:10 ` Miklos Szeredi
2020-03-31 15:24 ` Lennart Poettering
2020-03-31 21:56 ` David Howells
2020-03-31 21:54 ` David Howells
2020-04-01 8:43 ` Karel Zak
2020-03-31 7:22 ` Lennart Poettering
2020-03-31 17:31 ` David Howells
2020-03-31 19:42 ` Miklos Szeredi
2020-03-31 19:47 ` David Howells
2020-03-31 21:14 ` David Howells
2020-03-31 21:23 ` David Howells
2020-03-31 21:52 ` David Howells
2020-04-01 9:04 ` Karel Zak
2020-04-01 13:34 ` Miklos Szeredi
2020-04-01 13:55 ` David Howells
2020-04-01 13:58 ` David Howells
2020-04-01 15:25 ` Miklos Szeredi
2020-04-03 9:11 ` Karel Zak
2020-04-01 16:01 ` David Howells
2020-04-01 16:30 ` Miklos Szeredi
2020-04-02 15:22 ` David Howells
2020-04-02 15:24 ` Miklos Szeredi
2020-04-02 15:42 ` David Howells
2020-04-02 15:24 ` David Howells
2020-04-01 14:41 ` Lennart Poettering
2020-04-01 15:33 ` Miklos Szeredi
2020-04-01 16:06 ` David Howells
2020-04-01 16:40 ` Miklos Szeredi
2020-04-02 2:52 ` Ian Kent
2020-04-02 13:52 ` Miklos Szeredi
2020-04-02 14:36 ` Lennart Poettering
2020-04-02 15:22 ` Miklos Szeredi
2020-04-02 15:28 ` Lennart Poettering
2020-04-02 15:35 ` Miklos Szeredi
2020-04-02 15:50 ` Lennart Poettering
2020-04-02 17:20 ` Miklos Szeredi
2020-04-03 11:08 ` Lennart Poettering
2020-04-03 11:48 ` Miklos Szeredi
2020-04-03 15:01 ` Lennart Poettering
2020-04-06 9:22 ` Miklos Szeredi
2020-04-06 17:29 ` Lennart Poettering
2020-04-07 2:21 ` Ian Kent
2020-04-07 13:59 ` Miklos Szeredi
2020-04-07 15:53 ` Lennart Poettering
2020-04-07 16:06 ` Miklos Szeredi
2020-04-02 15:51 ` David Howells
2020-04-02 15:56 ` David Howells
2020-04-03 1:44 ` Ian Kent
2020-04-03 11:11 ` Lennart Poettering
2020-04-03 11:38 ` Miklos Szeredi
2020-04-03 12:05 ` Richard Weinberger
2020-04-03 15:12 ` Lennart Poettering
2020-04-03 20:30 ` J. Bruce Fields
2020-04-06 8:35 ` Miklos Szeredi
2020-04-06 16:07 ` J. Bruce Fields
2020-04-06 9:17 ` Karel Zak
2020-04-06 16:34 ` Linus Torvalds
2020-04-06 18:46 ` J. Bruce Fields
2020-04-06 18:48 ` Lennart Poettering
2020-04-08 3:36 ` Linus Torvalds
2020-04-03 15:36 ` David Howells
2020-04-03 15:41 ` Lennart Poettering
2020-06-02 15:51 [GIT PULL] General notification queue and key notifications David Howells
2020-06-02 15:54 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200610095602.hjzyvehx5vkasavt@wittgenstein \
--to=christian.brauner@ubuntu.com \
--cc=andres@anarazel.de \
--cc=dhowells@redhat.com \
--cc=dray@redhat.com \
--cc=jarkko.sakkinen@linux.intel.com \
--cc=jlayton@redhat.com \
--cc=keyrings@vger.kernel.org \
--cc=kzak@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mszeredi@redhat.com \
--cc=raven@themaw.net \
--cc=swhiteho@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).