linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] General notification queue and key notifications
@ 2020-06-02 15:55 David Howells
  2020-06-03  2:15 ` Ian Kent
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: David Howells @ 2020-06-02 15:55 UTC (permalink / raw)
  To: torvalds
  Cc: dhowells, viro, dray, kzak, mszeredi, swhiteho, jlayton, raven,
	andres, christian.brauner, jarkko.sakkinen, keyrings,
	linux-fsdevel, linux-kernel

Date: Tue, 02 Jun 2020 16:51:44 +0100

Hi Linus,

Can you pull this, please?  It adds a general notification queue concept
and adds an event source for keys/keyrings, such as linking and unlinking
keys and changing their attributes.

Thanks to Debarshi Ray, we do have a pull request to use this to fix a
problem with gnome-online-accounts - as mentioned last time:

    https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

Without this, g-o-a has to constantly poll a keyring-based kerberos cache
to find out if kinit has changed anything.

[[ With regard to the mount/sb notifications and fsinfo(), Karel Zak and
   Ian Kent have been working on making libmount use them, preparatory to
   working on systemd:

	https://github.com/karelzak/util-linux/commits/topic/fsinfo
	https://github.com/raven-au/util-linux/commits/topic/fsinfo.public

   Development has stalled briefly due to other commitments, so I'm not
   sure I can ask you to pull those parts of the series for now.  Christian
   Brauner would like to use them in lxc, but hasn't started.
   ]]


LSM hooks are included:

 (1) A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set.  Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable.  The LSM
     should use current's credentials.  [Wanted by SELinux & Smack]

 (2) A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue.  This is given
     the credentials from the event generator (which may be the system) and
     the watch setter.  [Wanted by Smack]

I've provided SELinux and Smack with implementations of some of these hooks.


WHY
===

Key/keyring notifications are desirable because if you have your kerberos
tickets in a file/directory, your Gnome desktop will monitor that using
something like fanotify and tell you if your credentials cache changes.

However, we also have the ability to cache your kerberos tickets in the
session, user or persistent keyring so that it isn't left around on disk
across a reboot or logout.  Keyrings, however, cannot currently be
monitored asynchronously, so the desktop has to poll for it - not so good
on a laptop.  This facility will allow the desktop to avoid the need to
poll.


DESIGN DECISIONS
================

 (1) The notification queue is built on top of a standard pipe.  Messages
     are effectively spliced in.  The pipe is opened with a special flag:

	pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem like
     it will ever be applicable in this context)[?].  It is given up front
     to make it a lot easier to prohibit splice and co. from accessing the
     pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
     	 O_* flag if I can avoid it - should I add a pipe3() system call
     	 instead?

     The pipe is then configured::

	ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
	ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

 (2) It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the kernel
     has to be able to insert messages into the pipe *without* holding
     pipe->mutex and the code to make this work needs careful auditing.

 (3) sendfile(), splice() and vmsplice() are disabled on notification pipes
     because of the pipe->mutex issue and also because they sometimes want
     to revert what they just did - but one or more notification messages
     might've been interleaved in the ring.

 (4) The kernel inserts messages with the wait queue spinlock held.  This
     means that pipe_read() and pipe_write() have to take the spinlock to
     update the queue pointers.

 (5) Records in the buffer are binary, typed and have a length so that they
     can be of varying size.

     This allows multiple heterogeneous sources to share a common buffer;
     there are 16 million types available, of which I've used just a few,
     so there is scope for others to be used.  Tags may be specified when a
     watchpoint is created to help distinguish the sources.

 (6) Records are filterable as types have up to 256 subtypes that can be
     individually filtered.  Other filtration is also available.

 (7) Notification pipes don't interfere with each other; each may be bound
     to a different set of watches.  Any particular notification will be
     copied to all the queues that are currently watching for it - and only
     those that are watching for it.

 (8) When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's insufficient
     space.  read() will fabricate a loss notification message at an
     appropriate point later.

 (9) The notification pipe is created and then watchpoints are attached to
     it, using one of:

	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
	watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
	watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

     where in both cases, fd indicates the queue and the number after is a
     tag between 0 and 255.

(10) Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed.  In the latter case, a message will
     be generated indicating the enforced watch removal.


Things I want to avoid:

 (1) Introducing features that make the core VFS dependent on the network
     stack or networking namespaces (ie. usage of netlink).

 (2) Dumping all this stuff into dmesg and having a daemon that sits there
     parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky.  Further, dmesg might not exist or might be
     inaccessible inside a container.

 (3) Letting users see events they shouldn't be able to see.


TESTING AND MANPAGES
====================

 (*) The keyutils tree has a pipe-watch branch that has keyctl commands for
     making use of notifications.  Proposed manual pages can also be found
     on this branch, though a couple of them really need to go to the main
     manpages repository instead.

     If the kernel supports the watching of keys, then running "make test"
     on that branch will cause the testing infrastructure to spawn a
     monitoring process on the side that monitors a notifications pipe for
     all the key/keyring changes induced by the tests and they'll all be
     checked off to make sure they happened.

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

 (*) A test program is provided (samples/watch_queue/watch_test) that can
     be used to monitor for keyrings, mount and superblock events.
     Information on the notifications is simply logged to stdout.

Thanks,
David
---
The following changes since commit b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce:

  Linux 5.7-rc6 (2020-05-17 16:48:37 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-20200601

for you to fetch changes up to a8478a602913dc89a7cd2060e613edecd07e1dbd:

  smack: Implement the watch_key and post_notification hooks (2020-05-19 15:47:38 +0100)

----------------------------------------------------------------
Notifications over pipes + Keyring notifications

----------------------------------------------------------------
David Howells (12):
      uapi: General notification queue definitions
      security: Add a hook for the point of notification insertion
      pipe: Add O_NOTIFICATION_PIPE
      pipe: Add general notification queue support
      security: Add hooks to rule on setting a watch
      watch_queue: Add a key/keyring notification facility
      Add sample notification program
      pipe: Allow buffers to be marked read-whole-or-error for notifications
      pipe: Add notification lossage handling
      keys: Make the KEY_NEED_* perms an enum rather than a mask
      selinux: Implement the watch_key security hook
      smack: Implement the watch_key and post_notification hooks

 Documentation/security/keys/core.rst               |  57 ++
 Documentation/userspace-api/ioctl/ioctl-number.rst |   1 +
 Documentation/watch_queue.rst                      | 339 +++++++++++
 fs/pipe.c                                          | 242 +++++---
 fs/splice.c                                        |  12 +-
 include/linux/key.h                                |  33 +-
 include/linux/lsm_audit.h                          |   1 +
 include/linux/lsm_hook_defs.h                      |   9 +
 include/linux/lsm_hooks.h                          |  14 +
 include/linux/pipe_fs_i.h                          |  27 +-
 include/linux/security.h                           |  30 +-
 include/linux/watch_queue.h                        | 127 ++++
 include/uapi/linux/keyctl.h                        |   2 +
 include/uapi/linux/watch_queue.h                   | 104 ++++
 init/Kconfig                                       |  12 +
 kernel/Makefile                                    |   1 +
 kernel/watch_queue.c                               | 659 +++++++++++++++++++++
 samples/Kconfig                                    |   6 +
 samples/Makefile                                   |   1 +
 samples/watch_queue/Makefile                       |   7 +
 samples/watch_queue/watch_test.c                   | 186 ++++++
 security/keys/Kconfig                              |   9 +
 security/keys/compat.c                             |   3 +
 security/keys/gc.c                                 |   5 +
 security/keys/internal.h                           |  38 +-
 security/keys/key.c                                |  38 +-
 security/keys/keyctl.c                             | 115 +++-
 security/keys/keyring.c                            |  20 +-
 security/keys/permission.c                         |  31 +-
 security/keys/process_keys.c                       |  46 +-
 security/keys/request_key.c                        |   4 +-
 security/security.c                                |  22 +-
 security/selinux/hooks.c                           |  51 +-
 security/smack/smack_lsm.c                         | 112 +++-
 34 files changed, 2185 insertions(+), 179 deletions(-)
 create mode 100644 Documentation/watch_queue.rst
 create mode 100644 include/linux/watch_queue.h
 create mode 100644 include/uapi/linux/watch_queue.h
 create mode 100644 kernel/watch_queue.c
 create mode 100644 samples/watch_queue/Makefile
 create mode 100644 samples/watch_queue/watch_test.c


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-02 15:55 [GIT PULL] General notification queue and key notifications David Howells
@ 2020-06-03  2:15 ` Ian Kent
  2020-06-08  0:49   ` Ian Kent
  2020-06-10  9:56 ` Christian Brauner
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Ian Kent @ 2020-06-03  2:15 UTC (permalink / raw)
  To: David Howells, torvalds
  Cc: viro, dray, kzak, mszeredi, swhiteho, jlayton, andres,
	christian.brauner, jarkko.sakkinen, keyrings, linux-fsdevel,
	linux-kernel

On Tue, 2020-06-02 at 16:55 +0100, David Howells wrote:
> 
> [[ With regard to the mount/sb notifications and fsinfo(), Karel Zak
> and
>    Ian Kent have been working on making libmount use them,
> preparatory to
>    working on systemd:
> 
> 	https://github.com/karelzak/util-linux/commits/topic/fsinfo
> 	
> https://github.com/raven-au/util-linux/commits/topic/fsinfo.public
> 
>    Development has stalled briefly due to other commitments, so I'm
> not
>    sure I can ask you to pull those parts of the series for
> now.  Christian
>    Brauner would like to use them in lxc, but hasn't started.
>    ]]

Linus,

Just so your aware of what has been done and where we are at here's
a summary.

Karel has done quite a bit of work on libmount (at this stage it's
getting hold of the mount information, aka. fsinfo()) and most of
what I have done is included in that too which you can see in Karel's
repo above). You can see a couple of bug fixes and a little bit of
new code present in my repo which hasn't been sent over to Karel
yet.

This infrastructure is essential before notifications work is started
which is where we will see the most improvement.

It turns out that while systemd uses libmount it has it's own
notifications handling sub-system as it deals with several event
types, not just mount information, in the same area. So, unfortunately,
changes will need to be made there as well as in libmount, more so
than the trivial changes to use fsinfo() via libmount.

That's where we are at the moment and I will get back to it once
I've dealt with a few things I postponed to work on libmount.

If you would like a more detailed account of what we have found I
can provide that too.

Is there anything else you would like from me or Karel?

Ian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-03  2:15 ` Ian Kent
@ 2020-06-08  0:49   ` Ian Kent
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Kent @ 2020-06-08  0:49 UTC (permalink / raw)
  To: David Howells, torvalds
  Cc: viro, dray, kzak, mszeredi, swhiteho, jlayton, andres,
	christian.brauner, jarkko.sakkinen, keyrings, linux-fsdevel,
	linux-kernel

On Wed, 2020-06-03 at 10:15 +0800, Ian Kent wrote:
> On Tue, 2020-06-02 at 16:55 +0100, David Howells wrote:
> > [[ With regard to the mount/sb notifications and fsinfo(), Karel
> > Zak
> > and
> >    Ian Kent have been working on making libmount use them,
> > preparatory to
> >    working on systemd:
> > 
> > 	https://github.com/karelzak/util-linux/commits/topic/fsinfo
> > 	
> > https://github.com/raven-au/util-linux/commits/topic/fsinfo.public
> > 
> >    Development has stalled briefly due to other commitments, so I'm
> > not
> >    sure I can ask you to pull those parts of the series for
> > now.  Christian
> >    Brauner would like to use them in lxc, but hasn't started.
> >    ]]
> 
> Linus,
> 
> Just so your aware of what has been done and where we are at here's
> a summary.
> 
> Karel has done quite a bit of work on libmount (at this stage it's
> getting hold of the mount information, aka. fsinfo()) and most of
> what I have done is included in that too which you can see in Karel's
> repo above). You can see a couple of bug fixes and a little bit of
> new code present in my repo which hasn't been sent over to Karel
> yet.
> 
> This infrastructure is essential before notifications work is started
> which is where we will see the most improvement.
> 
> It turns out that while systemd uses libmount it has it's own
> notifications handling sub-system as it deals with several event
> types, not just mount information, in the same area. So,
> unfortunately,
> changes will need to be made there as well as in libmount, more so
> than the trivial changes to use fsinfo() via libmount.
> 
> That's where we are at the moment and I will get back to it once
> I've dealt with a few things I postponed to work on libmount.
> 
> If you would like a more detailed account of what we have found I
> can provide that too.
> 
> Is there anything else you would like from me or Karel?

I think there's a bit more I should say about this.

One reason work hasn't progressed further on this is I spent
quite a bit of time looking at the affects of using fsinfo().

My testing was done by using a large autofs direct mount map of
20000 entries which means that at autofs startup 20000 autofs
mounts must be done and at autofs shutdown those 20000 mounts
must be umounted. Not very scientific but something to use to
get a feel for the affect of our changes.

Initially just using fsinfo() to load all the mount entries was
done to see how that would perform. This was done in a way that
required no modifications to library user code but didn't get
much improvement.

Next loading all the mount ids (alone) for mount entry traversal
was done and the various fields retrieved on-demand (implemented
by Karel).

Loading the entire mount table and then traversing the entries
means the mount table is always possibly out of date. And loading
the ids and getting the fields on-demand might have made that
problem worse. But loading only the mount ids and using an
on-demand method to get needed fields worked surprisingly well.

The main issue is a mount going away while getting the fields.
Testing showed that simply checking the field is valid and
ignoring the entry if it isn't is enough to handle that case.

Also the mount going away after the needed fields have been
retrieved must be handled by callers of libmount as mounts
can just as easily go away after reading the proc based tables.

The case of the underlying mount information changing needs to
be considered too. We will need to do better on that in the
future but it too is a problem with the proc table handing and
hasn't seen problems logged against libmount for it AFAIK.

So, all in all, this approach worked pretty well as libmount
users do use the getter access methods to retrieve the mount
entry fields (which is required for the on-demand method to
work). Certainly systemd always uses them (and it looks like
udisks2 does too).

Unfortunately using the libmount on-demand implementation
requires library user code be modified (only a little in
the systemd case) to use the implementation.

Testing showed that we get between 10-15% reduction in
overhead and CPU usage remained high.

I think processing large numbers of mounts is simply a lot
of work and there are particular cases that will remain that
require the use of the load and traverse method. For example
matching all mounts with a given prefix string (one of the
systemd use cases).

It's hard to get information about this but I can say that
running pref during the autofs start and stop shows the bulk
of the counter hits on the fsinfo() table construction code
so that ahs to be where the overhead is.

The unavoidable conclusion is that the load and traverse method
that's been imposed on us for so long (even before libmount)
for mount handling is what we need to get away from. After all,
this is essentially where the problem comes from in the first
place. And fsinfo() is designed to not need to use this method
for getting mount information for that reason.

There's also the notifications side of things which is the next
area to work on. Looking at systemd I see that monitoring the
proc mount table leads to a load, traverse, and process of the
entire table for every single notification. It's clear that's
because of the (what I'll call) anonymous notifications that we
have now.

The notifications in David's series carry event specific
information, for example the mount id for mount notifications
and the libmount fsinfo() implementation is written to use the
mount id (lowest overhead lookup option), so there has to be
significant improvement for this case.

But systemd has it's own notifications handling code so there
will need to be non-trivial changes there as well as changes
in libmount.

Bottom line is we have a bit of a challenge with this because we
are trying to change coding practices developed over many years
that, necessarily, use a load/traverse method and it's going to
take quite a while to change these coding practices.

My question is, is there something specific, besides what we are
doing, that you'd like to see done now in order to get the series
merged?

Ian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-02 15:55 [GIT PULL] General notification queue and key notifications David Howells
  2020-06-03  2:15 ` Ian Kent
@ 2020-06-10  9:56 ` Christian Brauner
  2020-06-10 11:12 ` Karel Zak
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Christian Brauner @ 2020-06-10  9:56 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, viro, dray, kzak, mszeredi, swhiteho, jlayton, raven,
	andres, jarkko.sakkinen, keyrings, linux-fsdevel, linux-kernel

On Tue, Jun 02, 2020 at 04:55:04PM +0100, David Howells wrote:
> Date: Tue, 02 Jun 2020 16:51:44 +0100
> 
> Hi Linus,
> 
> Can you pull this, please?  It adds a general notification queue concept
> and adds an event source for keys/keyrings, such as linking and unlinking
> keys and changing their attributes.
> 
> Thanks to Debarshi Ray, we do have a pull request to use this to fix a
> problem with gnome-online-accounts - as mentioned last time:
> 
>     https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47
> 
> Without this, g-o-a has to constantly poll a keyring-based kerberos cache
> to find out if kinit has changed anything.
> 
> [[ With regard to the mount/sb notifications and fsinfo(), Karel Zak and

The mount/sb notification and fsinfo() stuff is something we'd like to
use. (And then later extend to allow for supervised mounts where a
container manager can supervise the mounts of an unprivileged
container.)
I'm not sure if the mount notifications are already part of this pr.

Christian

>    Ian Kent have been working on making libmount use them, preparatory to
>    working on systemd:
> 
> 	https://github.com/karelzak/util-linux/commits/topic/fsinfo
> 	https://github.com/raven-au/util-linux/commits/topic/fsinfo.public
> 
>    Development has stalled briefly due to other commitments, so I'm not
>    sure I can ask you to pull those parts of the series for now.  Christian
>    Brauner would like to use them in lxc, but hasn't started.
>    ]]
> 
> 
> LSM hooks are included:
> 
>  (1) A set of hooks are provided that allow an LSM to rule on whether or
>      not a watch may be set.  Each of these hooks takes a different
>      "watched object" parameter, so they're not really shareable.  The LSM
>      should use current's credentials.  [Wanted by SELinux & Smack]
> 
>  (2) A hook is provided to allow an LSM to rule on whether or not a
>      particular message may be posted to a particular queue.  This is given
>      the credentials from the event generator (which may be the system) and
>      the watch setter.  [Wanted by Smack]
> 
> I've provided SELinux and Smack with implementations of some of these hooks.
> 
> 
> WHY
> ===
> 
> Key/keyring notifications are desirable because if you have your kerberos
> tickets in a file/directory, your Gnome desktop will monitor that using
> something like fanotify and tell you if your credentials cache changes.
> 
> However, we also have the ability to cache your kerberos tickets in the
> session, user or persistent keyring so that it isn't left around on disk
> across a reboot or logout.  Keyrings, however, cannot currently be
> monitored asynchronously, so the desktop has to poll for it - not so good
> on a laptop.  This facility will allow the desktop to avoid the need to
> poll.
> 
> 
> DESIGN DECISIONS
> ================
> 
>  (1) The notification queue is built on top of a standard pipe.  Messages
>      are effectively spliced in.  The pipe is opened with a special flag:
> 
> 	pipe2(fds, O_NOTIFICATION_PIPE);
> 
>      The special flag has the same value as O_EXCL (which doesn't seem like
>      it will ever be applicable in this context)[?].  It is given up front
>      to make it a lot easier to prohibit splice and co. from accessing the
>      pipe.
> 
>      [?] Should this be done some other way?  I'd rather not use up a new
>      	 O_* flag if I can avoid it - should I add a pipe3() system call
>      	 instead?
> 
>      The pipe is then configured::
> 
> 	ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
> 	ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
> 
>      Messages are then read out of the pipe using read().
> 
>  (2) It should be possible to allow write() to insert data into the
>      notification pipes too, but this is currently disabled as the kernel
>      has to be able to insert messages into the pipe *without* holding
>      pipe->mutex and the code to make this work needs careful auditing.
> 
>  (3) sendfile(), splice() and vmsplice() are disabled on notification pipes
>      because of the pipe->mutex issue and also because they sometimes want
>      to revert what they just did - but one or more notification messages
>      might've been interleaved in the ring.
> 
>  (4) The kernel inserts messages with the wait queue spinlock held.  This
>      means that pipe_read() and pipe_write() have to take the spinlock to
>      update the queue pointers.
> 
>  (5) Records in the buffer are binary, typed and have a length so that they
>      can be of varying size.
> 
>      This allows multiple heterogeneous sources to share a common buffer;
>      there are 16 million types available, of which I've used just a few,
>      so there is scope for others to be used.  Tags may be specified when a
>      watchpoint is created to help distinguish the sources.
> 
>  (6) Records are filterable as types have up to 256 subtypes that can be
>      individually filtered.  Other filtration is also available.
> 
>  (7) Notification pipes don't interfere with each other; each may be bound
>      to a different set of watches.  Any particular notification will be
>      copied to all the queues that are currently watching for it - and only
>      those that are watching for it.
> 
>  (8) When recording a notification, the kernel will not sleep, but will
>      rather mark a queue as having lost a message if there's insufficient
>      space.  read() will fabricate a loss notification message at an
>      appropriate point later.
> 
>  (9) The notification pipe is created and then watchpoints are attached to
>      it, using one of:
> 
> 	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
> 	watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
> 	watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);
> 
>      where in both cases, fd indicates the queue and the number after is a
>      tag between 0 and 255.
> 
> (10) Watches are removed if either the notification pipe is destroyed or
>      the watched object is destroyed.  In the latter case, a message will
>      be generated indicating the enforced watch removal.
> 
> 
> Things I want to avoid:
> 
>  (1) Introducing features that make the core VFS dependent on the network
>      stack or networking namespaces (ie. usage of netlink).
> 
>  (2) Dumping all this stuff into dmesg and having a daemon that sits there
>      parsing the output and distributing it as this then puts the
>      responsibility for security into userspace and makes handling
>      namespaces tricky.  Further, dmesg might not exist or might be
>      inaccessible inside a container.
> 
>  (3) Letting users see events they shouldn't be able to see.
> 
> 
> TESTING AND MANPAGES
> ====================
> 
>  (*) The keyutils tree has a pipe-watch branch that has keyctl commands for
>      making use of notifications.  Proposed manual pages can also be found
>      on this branch, though a couple of them really need to go to the main
>      manpages repository instead.
> 
>      If the kernel supports the watching of keys, then running "make test"
>      on that branch will cause the testing infrastructure to spawn a
>      monitoring process on the side that monitors a notifications pipe for
>      all the key/keyring changes induced by the tests and they'll all be
>      checked off to make sure they happened.
> 
> 	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch
> 
>  (*) A test program is provided (samples/watch_queue/watch_test) that can
>      be used to monitor for keyrings, mount and superblock events.
>      Information on the notifications is simply logged to stdout.
> 
> Thanks,
> David
> ---
> The following changes since commit b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce:
> 
>   Linux 5.7-rc6 (2020-05-17 16:48:37 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-20200601
> 
> for you to fetch changes up to a8478a602913dc89a7cd2060e613edecd07e1dbd:
> 
>   smack: Implement the watch_key and post_notification hooks (2020-05-19 15:47:38 +0100)
> 
> ----------------------------------------------------------------
> Notifications over pipes + Keyring notifications
> 
> ----------------------------------------------------------------
> David Howells (12):
>       uapi: General notification queue definitions
>       security: Add a hook for the point of notification insertion
>       pipe: Add O_NOTIFICATION_PIPE
>       pipe: Add general notification queue support
>       security: Add hooks to rule on setting a watch
>       watch_queue: Add a key/keyring notification facility
>       Add sample notification program
>       pipe: Allow buffers to be marked read-whole-or-error for notifications
>       pipe: Add notification lossage handling
>       keys: Make the KEY_NEED_* perms an enum rather than a mask
>       selinux: Implement the watch_key security hook
>       smack: Implement the watch_key and post_notification hooks
> 
>  Documentation/security/keys/core.rst               |  57 ++
>  Documentation/userspace-api/ioctl/ioctl-number.rst |   1 +
>  Documentation/watch_queue.rst                      | 339 +++++++++++
>  fs/pipe.c                                          | 242 +++++---
>  fs/splice.c                                        |  12 +-
>  include/linux/key.h                                |  33 +-
>  include/linux/lsm_audit.h                          |   1 +
>  include/linux/lsm_hook_defs.h                      |   9 +
>  include/linux/lsm_hooks.h                          |  14 +
>  include/linux/pipe_fs_i.h                          |  27 +-
>  include/linux/security.h                           |  30 +-
>  include/linux/watch_queue.h                        | 127 ++++
>  include/uapi/linux/keyctl.h                        |   2 +
>  include/uapi/linux/watch_queue.h                   | 104 ++++
>  init/Kconfig                                       |  12 +
>  kernel/Makefile                                    |   1 +
>  kernel/watch_queue.c                               | 659 +++++++++++++++++++++
>  samples/Kconfig                                    |   6 +
>  samples/Makefile                                   |   1 +
>  samples/watch_queue/Makefile                       |   7 +
>  samples/watch_queue/watch_test.c                   | 186 ++++++
>  security/keys/Kconfig                              |   9 +
>  security/keys/compat.c                             |   3 +
>  security/keys/gc.c                                 |   5 +
>  security/keys/internal.h                           |  38 +-
>  security/keys/key.c                                |  38 +-
>  security/keys/keyctl.c                             | 115 +++-
>  security/keys/keyring.c                            |  20 +-
>  security/keys/permission.c                         |  31 +-
>  security/keys/process_keys.c                       |  46 +-
>  security/keys/request_key.c                        |   4 +-
>  security/security.c                                |  22 +-
>  security/selinux/hooks.c                           |  51 +-
>  security/smack/smack_lsm.c                         | 112 +++-
>  34 files changed, 2185 insertions(+), 179 deletions(-)
>  create mode 100644 Documentation/watch_queue.rst
>  create mode 100644 include/linux/watch_queue.h
>  create mode 100644 include/uapi/linux/watch_queue.h
>  create mode 100644 kernel/watch_queue.c
>  create mode 100644 samples/watch_queue/Makefile
>  create mode 100644 samples/watch_queue/watch_test.c
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-02 15:55 [GIT PULL] General notification queue and key notifications David Howells
  2020-06-03  2:15 ` Ian Kent
  2020-06-10  9:56 ` Christian Brauner
@ 2020-06-10 11:12 ` Karel Zak
  2020-06-12 21:32   ` Linus Torvalds
                     ` (3 more replies)
  2020-06-13 18:00 ` pr-tracker-bot
  2020-06-17  1:15 ` Williams, Dan J
  4 siblings, 4 replies; 22+ messages in thread
From: Karel Zak @ 2020-06-10 11:12 UTC (permalink / raw)
  To: torvalds
  Cc: David Howells, viro, dray, mszeredi, swhiteho, jlayton, raven,
	andres, christian.brauner, jarkko.sakkinen, keyrings,
	linux-fsdevel, linux-kernel


 Hi Linus,

On Tue, Jun 02, 2020 at 04:55:04PM +0100, David Howells wrote:
> Can you pull this, please?  It adds a general notification queue concept

I'm trying to use David's notification stuff in userspace, and I guess
feedback is welcome :-)

The notification stuff looks pretty promising, but I do not understand
why we need to use pipe for this purpose, see typical userspace use-case:

        int pipefd[2], fd;

        if (pipe2(pipefd, O_NOTIFICATION_PIPE) == -1)
                err(EXIT_FAILURE, "pipe2 failed");

        fd = pipefd[0];

All the next operations are done with "fd". It's nowhere used as a
pipe, and nothing uses pipefd[1]. The first impression from this code
is "oh, this is strange; why?".

Is it because we need to create a new file descriptor from nothing?
Why O_NOTIFICATION_PIPE is better than introduce a new syscall
notifyfd()?

(We have signalfd(), no O_SIGNAL_PIPE, etc.) 

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-10 11:12 ` Karel Zak
@ 2020-06-12 21:32   ` Linus Torvalds
  2020-06-12 22:01   ` Linus Torvalds
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2020-06-12 21:32 UTC (permalink / raw)
  To: Karel Zak
  Cc: David Howells, Al Viro, dray, Miklos Szeredi, Steven Whitehouse,
	Jeff Layton, Ian Kent, andres, Christian Brauner,
	Jarkko Sakkinen, keyrings, linux-fsdevel,
	Linux Kernel Mailing List

[ Finally getting around to this since my normal pull queue is now empty ]

On Wed, Jun 10, 2020 at 4:13 AM Karel Zak <kzak@redhat.com> wrote:
>
> The notification stuff looks pretty promising, but I do not understand
> why we need to use pipe for this purpose

The original intent was never to use the "pipe()" system call itself,
only use pipes as the actual transport mechanism (because I do not for
a second believe in the crazy "use sockets" model that a lot of other
people seem to blindly believe in).

But using "pipe()" also allows for non-kernel notification queues (ie
where the events come from a user space process). Then you'd not use
O_NOTIFICATION_PIPE, but O_DIRECT (for a packetized pipe).

> Is it because we need to create a new file descriptor from nothing?
> Why O_NOTIFICATION_PIPE is better than introduce a new syscall
> notifyfd()?

We could eventually introduce a new system call.

But I most definitely did *NOT* want to see anything like that for any
first gen stuff.  Especially since it wasn't clear who was going to
use it, and whether early trials would literally be done with that
user-space emulation model of using a perfectly regular pipe (just
with packetization).

I'm not even convinced O_NOTIFICATION_PIPE is necessary, but at worst
it will be a useful marker. I think the only real reason for it was to
avoid any clashes with splice(), which has more complex use of the
pipe buffers.

I'm so far just reading this thread and the arguments for users, and I
haven't yet looked at all the actual details in the pull request - but
last time I had objections to things it wasn't the code, it was the
lack of any use.

             Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-10 11:12 ` Karel Zak
  2020-06-12 21:32   ` Linus Torvalds
@ 2020-06-12 22:01   ` Linus Torvalds
  2020-06-13 13:04   ` David Howells
  2020-06-13 13:24   ` David Howells
  3 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2020-06-12 22:01 UTC (permalink / raw)
  To: Karel Zak
  Cc: David Howells, Al Viro, dray, Miklos Szeredi, Steven Whitehouse,
	Jeff Layton, Ian Kent, andres, Christian Brauner,
	Jarkko Sakkinen, keyrings, linux-fsdevel,
	Linux Kernel Mailing List

[ Actually going through the code now ]

On Wed, Jun 10, 2020 at 4:13 AM Karel Zak <kzak@redhat.com> wrote:
>
> All the next operations are done with "fd". It's nowhere used as a
> pipe, and nothing uses pipefd[1].

As an aside, that isn't necessarily true.

In some of the examples, pipefd[1] is used for configuration (sizing
and adding filters), although I think right now that's not really
enforced, and other examples seem to have pipefd[0] do that too.

DavidH: should that perhaps be a hard rule, so that you can pass a
pipefd[0] to readers, while knowing that they can't then change the
kinds of notifications they see.

In the "pipe: Add general notification queue support" commit message,
the code example uses pipefd[0] for IOC_WATCH_QUEUE_SET_SIZE, but then
in the commit message for "watch_queue: Add a key/keyring notification
facility" it uses pipefd[1].

And that latter example does make sense: using the write-side
pipefd[1] for configuration, while the read-side pipefd[0] is the side
that sees the results. That is also how it would work if you have a
user-mode pipe with the notification source controlling the writing
side - the reading side can obviously not add filters or change the
semantics of the watches.

So that allows a trusted side to add and create filters, while some
untrusted entity can then see the results.

This isn't going to hold up me merging the code, but it would be good
to clarify and make that something that gets enforced if we decide
it's worth it.

It does seem conceptually like a good idea, and potentially actually
useful to clearly separate the domain of "you can add watches and
filters" from "you can see the notifications".

               Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-10 11:12 ` Karel Zak
  2020-06-12 21:32   ` Linus Torvalds
  2020-06-12 22:01   ` Linus Torvalds
@ 2020-06-13 13:04   ` David Howells
  2020-06-13 16:47     ` Linus Torvalds
  2020-06-13 19:22     ` Miklos Szeredi
  2020-06-13 13:24   ` David Howells
  3 siblings, 2 replies; 22+ messages in thread
From: David Howells @ 2020-06-13 13:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Karel Zak, Al Viro, dray, Miklos Szeredi,
	Steven Whitehouse, Jeff Layton, Ian Kent, andres,
	Christian Brauner, Jarkko Sakkinen, keyrings, linux-fsdevel,
	Linux Kernel Mailing List

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I'm not even convinced O_NOTIFICATION_PIPE is necessary, but at worst
> it will be a useful marker. I think the only real reason for it was to
> avoid any clashes with splice(), which has more complex use of the
> pipe buffers.

The main reason is to prevent splice because the iov_iter rewind for splice
gets quite tricky if the kernel can randomly insert packets into the pipe
buffer in between what splice is inserting.

> I'm so far just reading this thread and the arguments for users, and I
> haven't yet looked at all the actual details in the pull request - but
> last time I had objections to things it wasn't the code, it was the
> lack of any use.

Would you be willing at this point to consider pulling the mount notifications
and fsinfo() which helps support that?  I could whip up pull reqs for those
two pieces - or do you want to see more concrete patches that use it?

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-10 11:12 ` Karel Zak
                     ` (2 preceding siblings ...)
  2020-06-13 13:04   ` David Howells
@ 2020-06-13 13:24   ` David Howells
  3 siblings, 0 replies; 22+ messages in thread
From: David Howells @ 2020-06-13 13:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Karel Zak, Al Viro, dray, Miklos Szeredi,
	Steven Whitehouse, Jeff Layton, Ian Kent, andres,
	Christian Brauner, Jarkko Sakkinen, keyrings, linux-fsdevel,
	Linux Kernel Mailing List

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > All the next operations are done with "fd". It's nowhere used as a
> > pipe, and nothing uses pipefd[1].
> 
> As an aside, that isn't necessarily true.
> 
> In some of the examples, pipefd[1] is used for configuration (sizing
> and adding filters), although I think right now that's not really
> enforced, and other examples seem to have pipefd[0] do that too.

The configuration can happen on either end of the pipe.  I just need to be
able to find the pipe object.

> DavidH: should that perhaps be a hard rule, so that you can pass a
> pipefd[0] to readers, while knowing that they can't then change the
> kinds of notifications they see.

You can argue that the other way: that it should be a hard rule that you can
pass pipefd[1] to writers, whilst knowing that they can't then change the kind
of notifications that the kernel can insert into the pipe.  My feeling is that
it's more likely that you would keep the read end yourself and give the write
end away - if at all.  Most likely, IMO, would be that you attach notification
sources and never use the write end directly.

There is some argument for making it so that the notification sources belong
to the read end only and that they keep the write side alive internally -
meaning that you can just close the write end.  All the notification sources
just then disappear when the read end is closed - but dup() might make this
kind of tricky as there is only one pipe object and its shared between both
ends.  The existence of O_RDWR FIFOs might also make this tricky.

> In the "pipe: Add general notification queue support" commit message,
> the code example uses pipefd[0] for IOC_WATCH_QUEUE_SET_SIZE, but then
> in the commit message for "watch_queue: Add a key/keyring notification
> facility" it uses pipefd[1].
>
> And that latter example does make sense: using the write-side
> pipefd[1] for configuration, while the read-side pipefd[0] is the side
> that sees the results. That is also how it would work if you have a
> user-mode pipe with the notification source controlling the writing
> side - the reading side can obviously not add filters or change the
> semantics of the watches.
> 
> So that allows a trusted side to add and create filters, while some
> untrusted entity can then see the results.

As stated above, I think you should be looking at this the other way round -
you're more likely to keep the read end for yourself.  If you attach multiple
sources to a pipe, everything they produce comes out mixed together from the
read end of the pipe.

You might even pass the write end to multiple userspace-side event generators,
but I'm not sure it would make sense to pass the read end around unless you
have sufficient flow that you need multiple consumers to keep up with it.

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-13 13:04   ` David Howells
@ 2020-06-13 16:47     ` Linus Torvalds
  2020-06-13 17:03       ` Linus Torvalds
  2020-06-13 19:22     ` Miklos Szeredi
  1 sibling, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2020-06-13 16:47 UTC (permalink / raw)
  To: David Howells
  Cc: Karel Zak, Al Viro, dray, Miklos Szeredi, Steven Whitehouse,
	Jeff Layton, Ian Kent, andres, Christian Brauner,
	Jarkko Sakkinen, keyrings, linux-fsdevel,
	Linux Kernel Mailing List

On Sat, Jun 13, 2020 at 6:05 AM David Howells <dhowells@redhat.com> wrote:
>
> Would you be willing at this point to consider pulling the mount notifications
> and fsinfo() which helps support that?  I could whip up pull reqs for those
> two pieces - or do you want to see more concrete patches that use it?

I'd want to see more concrete use cases, but I'd also like to see that
this keyring thing gets used and doesn't find any show-stoppers when
it does.

If we have multiple uses, and one of them notices some problem that
requires any ABI changes, but the other one has already started using
it, we'll have more problems.

          Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-13 16:47     ` Linus Torvalds
@ 2020-06-13 17:03       ` Linus Torvalds
  0 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2020-06-13 17:03 UTC (permalink / raw)
  To: David Howells
  Cc: Karel Zak, Al Viro, dray, Miklos Szeredi, Steven Whitehouse,
	Jeff Layton, Ian Kent, andres, Christian Brauner,
	Jarkko Sakkinen, keyrings, linux-fsdevel,
	Linux Kernel Mailing List

On Sat, Jun 13, 2020 at 9:47 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> If we have multiple uses, and one of them notices some problem that
> requires any ABI changes, but the other one has already started using
> it, we'll have more problems.

Ok, it's merged in my tree, although I was somewhat unhappy about the
incomprehensible calling conventions of "get_pipe_info()". The random
second argument just makes no sense when you read the code, it would
have probably been better as a helper function or #define to clarify
the whole "for_splice" thing.

But let's see how it works and what actually happens.

               Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-02 15:55 [GIT PULL] General notification queue and key notifications David Howells
                   ` (2 preceding siblings ...)
  2020-06-10 11:12 ` Karel Zak
@ 2020-06-13 18:00 ` pr-tracker-bot
  2020-06-17  1:15 ` Williams, Dan J
  4 siblings, 0 replies; 22+ messages in thread
From: pr-tracker-bot @ 2020-06-13 18:00 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, dhowells, viro, dray, kzak, mszeredi, swhiteho,
	jlayton, raven, andres, christian.brauner, jarkko.sakkinen,
	keyrings, linux-fsdevel, linux-kernel

The pull request you sent on Tue, 02 Jun 2020 16:55:04 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-20200601

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6c3297841472b4e53e22e53826eea9e483d993e5

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-13 13:04   ` David Howells
  2020-06-13 16:47     ` Linus Torvalds
@ 2020-06-13 19:22     ` Miklos Szeredi
  1 sibling, 0 replies; 22+ messages in thread
From: Miklos Szeredi @ 2020-06-13 19:22 UTC (permalink / raw)
  To: David Howells
  Cc: Linus Torvalds, Karel Zak, Al Viro, dray, Steven Whitehouse,
	Jeff Layton, Ian Kent, andres, Christian Brauner,
	Jarkko Sakkinen, keyrings, linux-fsdevel,
	Linux Kernel Mailing List

On Sat, Jun 13, 2020 at 3:05 PM David Howells <dhowells@redhat.com> wrote:

> > I'm so far just reading this thread and the arguments for users, and I
> > haven't yet looked at all the actual details in the pull request - but
> > last time I had objections to things it wasn't the code, it was the
> > lack of any use.
>
> Would you be willing at this point to consider pulling the mount notifications
> and fsinfo() which helps support that?  I could whip up pull reqs for those
> two pieces - or do you want to see more concrete patches that use it?

Well, I had some questions and comments for the mount notifications
last time around[1] and didn't yet get a reply.

And the fsinfo stuff is simply immature, please lets not merge it just
yet.  When we have some uses (most notably systemd) running on top of
the current fsinfo interface, we can sit down and discuss how the API
can be cleaned up.

BTW I had a similar experience with the fsconfig() merge, which was
pushed with some unpolished bits and where my comments were also
largely ignored.  So, before asking to pull, please at least *answer*
reviews.  You don't have to agree, but at least consider and think
about the comments.

Thanks,
Miklos

[1] https://lore.kernel.org/linux-fsdevel/CAJfpegspWA6oUtdcYvYF=3fij=Bnq03b8VMbU9RNMKc+zzjbag@mail.gmail.com/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-02 15:55 [GIT PULL] General notification queue and key notifications David Howells
                   ` (3 preceding siblings ...)
  2020-06-13 18:00 ` pr-tracker-bot
@ 2020-06-17  1:15 ` Williams, Dan J
  2020-06-23 23:38   ` Dan Williams
  2020-06-24  0:55   ` David Howells
  4 siblings, 2 replies; 22+ messages in thread
From: Williams, Dan J @ 2020-06-17  1:15 UTC (permalink / raw)
  To: torvalds, dhowells
  Cc: raven, kzak, jarkko.sakkinen, linux-nvdimm, dray, swhiteho,
	linux-kernel, linux-fsdevel, mszeredi, jlayton, viro, andres,
	keyrings, christian.brauner

Hi David,

On Tue, 2020-06-02 at 16:55 +0100, David Howells wrote:
> Date: Tue, 02 Jun 2020 16:51:44 +0100
> 
> Hi Linus,
> 
> Can you pull this, please?  It adds a general notification queue
> concept
> and adds an event source for keys/keyrings, such as linking and
> unlinking
> keys and changing their attributes.
[..]

This commit:

>       keys: Make the KEY_NEED_* perms an enum rather than a mask

...upstream as:

    8c0637e950d6 keys: Make the KEY_NEED_* perms an enum rather than a mask

...triggers a regression in the libnvdimm unit test that exercises the
encrypted keys used to store nvdimm passphrases. It results in the
below warning.

---

WARNING: CPU: 15 PID: 6276 at security/keys/permission.c:35 key_task_permission+0xd3/0x140
Modules linked in: nd_blk(OE) nfit_test(OE) device_dax(OE) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) kvm_intel(E) kvm(E) irqbypass(E) nd_pmem(OE) dax_pmem(OE) nd_btt(OE) dax_p
ct10dif_pclmul(E) nd_e820(OE) nfit(OE) crc32_pclmul(E) libnvdimm(OE) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) encrypted_keys(E) trusted(E) nfit_test_iomap(OE) tpm(E) drm(E)
CPU: 15 PID: 6276 Comm: lt-ndctl Tainted: G           OE     5.7.0-rc6+ #155
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
RIP: 0010:key_task_permission+0xd3/0x140
Code: c8 21 d9 39 d9 75 25 48 83 c4 08 4c 89 e6 48 89 ef 5b 5d 41 5c 41 5d e9 1b a7 00 00 bb 01 00 00 00 83 fa 01 0f 84 68 ff ff ff <0f> 0b 48 83 c4 08 b8 f3 ff ff ff 5b 5d 41 5c 41 5d c3 83 fa 06

RSP: 0018:ffffaddc42db7c90 EFLAGS: 00010297
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffaddc42db7c7c
RDX: 0000000000000000 RSI: ffff985e1c46e840 RDI: ffff985e3a03de01
RBP: ffff985e3a03de01 R08: 0000000000000000 R09: 5461e7bc000002a0
R10: 0000000000000004 R11: 0000000066666666 R12: ffff985e1c46e840
R13: 0000000000000000 R14: ffffaddc42db7cd8 R15: ffff985e248c6540
FS:  00007f863c18a780(0000) GS:ffff985e3bbc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000006d3708 CR3: 0000000125a1e006 CR4: 0000000000160ee0
Call Trace:
 lookup_user_key+0xeb/0x6b0
 ? vsscanf+0x3df/0x840
 ? key_validate+0x50/0x50
 ? key_default_cmp+0x20/0x20
 nvdimm_get_user_key_payload.part.0+0x21/0x110 [libnvdimm]
 nvdimm_security_store+0x67d/0xb20 [libnvdimm]
 security_store+0x67/0x1a0 [libnvdimm]
 kernfs_fop_write+0xcf/0x1c0
 vfs_write+0xde/0x1d0
 ksys_write+0x68/0xe0
 do_syscall_64+0x5c/0xa0
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7f863c624547
Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffd61d8f5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007ffd61d8f640 RCX: 00007f863c624547
RDX: 0000000000000014 RSI: 00007ffd61d8f640 RDI: 0000000000000005
RBP: 0000000000000005 R08: 0000000000000014 R09: 00007ffd61d8f4a0
R10: fffffffffffff455 R11: 0000000000000246 R12: 00000000006dbbf0
R13: 00000000006cd710 R14: 00007f863c18a6a8 R15: 00007ffd61d8fae0
irq event stamp: 36976
hardirqs last  enabled at (36975): [<ffffffff9131fa40>] __slab_alloc+0x70/0x90
hardirqs last disabled at (36976): [<ffffffff910049c7>] trace_hardirqs_off_thunk+0x1a/0x1c
softirqs last  enabled at (35474): [<ffffffff91e00357>] __do_softirq+0x357/0x466
softirqs last disabled at (35467): [<ffffffff910eae96>] irq_exit+0xe6/0xf0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-17  1:15 ` Williams, Dan J
@ 2020-06-23 23:38   ` Dan Williams
  2020-06-24  0:55   ` David Howells
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-06-23 23:38 UTC (permalink / raw)
  To: torvalds, dhowells
  Cc: raven, kzak, jarkko.sakkinen, linux-nvdimm, dray, swhiteho,
	linux-kernel, linux-fsdevel, mszeredi, jlayton, viro, andres,
	keyrings, christian.brauner

On Tue, Jun 16, 2020 at 6:15 PM Williams, Dan J
<dan.j.williams@intel.com> wrote:
>
> Hi David,
>
> On Tue, 2020-06-02 at 16:55 +0100, David Howells wrote:
> > Date: Tue, 02 Jun 2020 16:51:44 +0100
> >
> > Hi Linus,
> >
> > Can you pull this, please?  It adds a general notification queue
> > concept
> > and adds an event source for keys/keyrings, such as linking and
> > unlinking
> > keys and changing their attributes.
> [..]
>
> This commit:
>
> >       keys: Make the KEY_NEED_* perms an enum rather than a mask
>
> ...upstream as:
>
>     8c0637e950d6 keys: Make the KEY_NEED_* perms an enum rather than a mask
>
> ...triggers a regression in the libnvdimm unit test that exercises the
> encrypted keys used to store nvdimm passphrases. It results in the
> below warning.

This regression is still present in tip of tree. David, have you had a
chance to take a look?



>
> ---
>
> WARNING: CPU: 15 PID: 6276 at security/keys/permission.c:35 key_task_permission+0xd3/0x140
> Modules linked in: nd_blk(OE) nfit_test(OE) device_dax(OE) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) kvm_intel(E) kvm(E) irqbypass(E) nd_pmem(OE) dax_pmem(OE) nd_btt(OE) dax_p
> ct10dif_pclmul(E) nd_e820(OE) nfit(OE) crc32_pclmul(E) libnvdimm(OE) crc32c_intel(E) ghash_clmulni_intel(E) serio_raw(E) encrypted_keys(E) trusted(E) nfit_test_iomap(OE) tpm(E) drm(E)
> CPU: 15 PID: 6276 Comm: lt-ndctl Tainted: G           OE     5.7.0-rc6+ #155
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> RIP: 0010:key_task_permission+0xd3/0x140
> Code: c8 21 d9 39 d9 75 25 48 83 c4 08 4c 89 e6 48 89 ef 5b 5d 41 5c 41 5d e9 1b a7 00 00 bb 01 00 00 00 83 fa 01 0f 84 68 ff ff ff <0f> 0b 48 83 c4 08 b8 f3 ff ff ff 5b 5d 41 5c 41 5d c3 83 fa 06
>
> RSP: 0018:ffffaddc42db7c90 EFLAGS: 00010297
> RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffaddc42db7c7c
> RDX: 0000000000000000 RSI: ffff985e1c46e840 RDI: ffff985e3a03de01
> RBP: ffff985e3a03de01 R08: 0000000000000000 R09: 5461e7bc000002a0
> R10: 0000000000000004 R11: 0000000066666666 R12: ffff985e1c46e840
> R13: 0000000000000000 R14: ffffaddc42db7cd8 R15: ffff985e248c6540
> FS:  00007f863c18a780(0000) GS:ffff985e3bbc0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000006d3708 CR3: 0000000125a1e006 CR4: 0000000000160ee0
> Call Trace:
>  lookup_user_key+0xeb/0x6b0
>  ? vsscanf+0x3df/0x840
>  ? key_validate+0x50/0x50
>  ? key_default_cmp+0x20/0x20
>  nvdimm_get_user_key_payload.part.0+0x21/0x110 [libnvdimm]
>  nvdimm_security_store+0x67d/0xb20 [libnvdimm]
>  security_store+0x67/0x1a0 [libnvdimm]
>  kernfs_fop_write+0xcf/0x1c0
>  vfs_write+0xde/0x1d0
>  ksys_write+0x68/0xe0
>  do_syscall_64+0x5c/0xa0
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x7f863c624547
> Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> RSP: 002b:00007ffd61d8f5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 00007ffd61d8f640 RCX: 00007f863c624547
> RDX: 0000000000000014 RSI: 00007ffd61d8f640 RDI: 0000000000000005
> RBP: 0000000000000005 R08: 0000000000000014 R09: 00007ffd61d8f4a0
> R10: fffffffffffff455 R11: 0000000000000246 R12: 00000000006dbbf0
> R13: 00000000006cd710 R14: 00007f863c18a6a8 R15: 00007ffd61d8fae0
> irq event stamp: 36976
> hardirqs last  enabled at (36975): [<ffffffff9131fa40>] __slab_alloc+0x70/0x90
> hardirqs last disabled at (36976): [<ffffffff910049c7>] trace_hardirqs_off_thunk+0x1a/0x1c
> softirqs last  enabled at (35474): [<ffffffff91e00357>] __do_softirq+0x357/0x466
> softirqs last disabled at (35467): [<ffffffff910eae96>] irq_exit+0xe6/0xf0
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-17  1:15 ` Williams, Dan J
  2020-06-23 23:38   ` Dan Williams
@ 2020-06-24  0:55   ` David Howells
  2020-06-24  1:03     ` Dan Williams
  2020-06-24  1:17     ` David Howells
  1 sibling, 2 replies; 22+ messages in thread
From: David Howells @ 2020-06-24  0:55 UTC (permalink / raw)
  To: Dan Williams
  Cc: dhowells, torvalds, raven, kzak, jarkko.sakkinen, linux-nvdimm,
	dray, swhiteho, linux-kernel, linux-fsdevel, mszeredi, jlayton,
	viro, andres, keyrings, christian.brauner

Dan Williams <dan.j.williams@intel.com> wrote:

> > This commit:
> >
> > >       keys: Make the KEY_NEED_* perms an enum rather than a mask
> >
> > ...upstream as:
> >
> >     8c0637e950d6 keys: Make the KEY_NEED_* perms an enum rather than a mask
> >
> > ...triggers a regression in the libnvdimm unit test that exercises the
> > encrypted keys used to store nvdimm passphrases. It results in the
> > below warning.
> 
> This regression is still present in tip of tree. David, have you had a
> chance to take a look?

nvdimm_lookup_user_key() needs to indicate to lookup_user_key() what it wants
the key for so that the appropriate security checks can take place in SELinux
and Smack.  Note that I have a patch in the works that changes this still
further.

Does setting the third argument of lookup_user_key() to KEY_NEED_SEARCH work
for you?

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-24  0:55   ` David Howells
@ 2020-06-24  1:03     ` Dan Williams
  2020-06-24  1:17     ` David Howells
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Williams @ 2020-06-24  1:03 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, raven, kzak, jarkko.sakkinen, linux-nvdimm, dray,
	swhiteho, linux-kernel, linux-fsdevel, mszeredi, jlayton, viro,
	andres, keyrings, christian.brauner

On Tue, Jun 23, 2020 at 5:55 PM David Howells <dhowells@redhat.com> wrote:
>
> Dan Williams <dan.j.williams@intel.com> wrote:
>
> > > This commit:
> > >
> > > >       keys: Make the KEY_NEED_* perms an enum rather than a mask
> > >
> > > ...upstream as:
> > >
> > >     8c0637e950d6 keys: Make the KEY_NEED_* perms an enum rather than a mask
> > >
> > > ...triggers a regression in the libnvdimm unit test that exercises the
> > > encrypted keys used to store nvdimm passphrases. It results in the
> > > below warning.
> >
> > This regression is still present in tip of tree. David, have you had a
> > chance to take a look?
>
> nvdimm_lookup_user_key() needs to indicate to lookup_user_key() what it wants
> the key for so that the appropriate security checks can take place in SELinux
> and Smack.  Note that I have a patch in the works that changes this still
> further.
>
> Does setting the third argument of lookup_user_key() to KEY_NEED_SEARCH work
> for you?

It does, thanks.

Shall I wait for your further reworks to fix this for v5.8, or is that
v5.9 material?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-24  0:55   ` David Howells
  2020-06-24  1:03     ` Dan Williams
@ 2020-06-24  1:17     ` David Howells
  1 sibling, 0 replies; 22+ messages in thread
From: David Howells @ 2020-06-24  1:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: dhowells, torvalds, raven, kzak, jarkko.sakkinen, linux-nvdimm,
	dray, swhiteho, linux-kernel, linux-fsdevel, mszeredi, jlayton,
	viro, andres, keyrings, christian.brauner

Dan Williams <dan.j.williams@intel.com> wrote:

> Shall I wait for your further reworks to fix this for v5.8, or is that
> v5.9 material?

It could do with stewing in linux-next for a while, so 5.9 probably.

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-06-02 15:51 David Howells
@ 2020-06-02 15:54 ` David Howells
  0 siblings, 0 replies; 22+ messages in thread
From: David Howells @ 2020-06-02 15:54 UTC (permalink / raw)
  Cc: dhowells, torvalds, viro, dray, kzak, mszeredi, swhiteho,
	jlayton, raven, andres, christian.brauner, jarkko.sakkinen,
	keyrings, linux-fsdevel, linux-kernel

Oops - I forgot to include the pull request.  Will resend.

David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [GIT PULL] General notification queue and key notifications
@ 2020-06-02 15:51 David Howells
  2020-06-02 15:54 ` David Howells
  0 siblings, 1 reply; 22+ messages in thread
From: David Howells @ 2020-06-02 15:51 UTC (permalink / raw)
  To: torvalds
  Cc: dhowells, viro, dray, kzak, mszeredi, swhiteho, jlayton, raven,
	andres, christian.brauner, jarkko.sakkinen, keyrings,
	linux-fsdevel, linux-kernel

Hi Linus,

Can you pull this, please?  It adds a general notification queue concept
and adds an event source for keys/keyrings, such as linking and unlinking
keys and changing their attributes.

Thanks to Debarshi Ray, we do have a pull request to use this to fix a
problem with gnome-online-accounts - as mentioned last time:

    https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47

Without this, g-o-a has to constantly poll a keyring-based kerberos cache
to find out if kinit has changed anything.

[[ With regard to the mount/sb notifications and fsinfo(), Karel Zak and
   Ian Kent have been working on making libmount use them, preparatory to
   working on systemd:

	https://github.com/karelzak/util-linux/commits/topic/fsinfo
	https://github.com/raven-au/util-linux/commits/topic/fsinfo.public

   Development has stalled briefly due to other commitments, so I'm not
   sure I can ask you to pull those parts of the series for now.  Christian
   Brauner would like to use them in lxc, but hasn't started.
   ]]


LSM hooks are included:

 (1) A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set.  Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable.  The LSM
     should use current's credentials.  [Wanted by SELinux & Smack]

 (2) A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue.  This is given
     the credentials from the event generator (which may be the system) and
     the watch setter.  [Wanted by Smack]

I've provided SELinux and Smack with implementations of some of these hooks.


WHY
===

Key/keyring notifications are desirable because if you have your kerberos
tickets in a file/directory, your Gnome desktop will monitor that using
something like fanotify and tell you if your credentials cache changes.

However, we also have the ability to cache your kerberos tickets in the
session, user or persistent keyring so that it isn't left around on disk
across a reboot or logout.  Keyrings, however, cannot currently be
monitored asynchronously, so the desktop has to poll for it - not so good
on a laptop.  This facility will allow the desktop to avoid the need to
poll.


DESIGN DECISIONS
================

 (1) The notification queue is built on top of a standard pipe.  Messages
     are effectively spliced in.  The pipe is opened with a special flag:

	pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem like
     it will ever be applicable in this context)[?].  It is given up front
     to make it a lot easier to prohibit splice and co. from accessing the
     pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
     	 O_* flag if I can avoid it - should I add a pipe3() system call
     	 instead?

     The pipe is then configured::

	ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
	ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

 (2) It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the kernel
     has to be able to insert messages into the pipe *without* holding
     pipe->mutex and the code to make this work needs careful auditing.

 (3) sendfile(), splice() and vmsplice() are disabled on notification pipes
     because of the pipe->mutex issue and also because they sometimes want
     to revert what they just did - but one or more notification messages
     might've been interleaved in the ring.

 (4) The kernel inserts messages with the wait queue spinlock held.  This
     means that pipe_read() and pipe_write() have to take the spinlock to
     update the queue pointers.

 (5) Records in the buffer are binary, typed and have a length so that they
     can be of varying size.

     This allows multiple heterogeneous sources to share a common buffer;
     there are 16 million types available, of which I've used just a few,
     so there is scope for others to be used.  Tags may be specified when a
     watchpoint is created to help distinguish the sources.

 (6) Records are filterable as types have up to 256 subtypes that can be
     individually filtered.  Other filtration is also available.

 (7) Notification pipes don't interfere with each other; each may be bound
     to a different set of watches.  Any particular notification will be
     copied to all the queues that are currently watching for it - and only
     those that are watching for it.

 (8) When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's insufficient
     space.  read() will fabricate a loss notification message at an
     appropriate point later.

 (9) The notification pipe is created and then watchpoints are attached to
     it, using one of:

	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
	watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
	watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

     where in both cases, fd indicates the queue and the number after is a
     tag between 0 and 255.

(10) Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed.  In the latter case, a message will
     be generated indicating the enforced watch removal.


Things I want to avoid:

 (1) Introducing features that make the core VFS dependent on the network
     stack or networking namespaces (ie. usage of netlink).

 (2) Dumping all this stuff into dmesg and having a daemon that sits there
     parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky.  Further, dmesg might not exist or might be
     inaccessible inside a container.

 (3) Letting users see events they shouldn't be able to see.


TESTING AND MANPAGES
====================

 (*) The keyutils tree has a pipe-watch branch that has keyctl commands for
     making use of notifications.  Proposed manual pages can also be found
     on this branch, though a couple of them really need to go to the main
     manpages repository instead.

     If the kernel supports the watching of keys, then running "make test"
     on that branch will cause the testing infrastructure to spawn a
     monitoring process on the side that monitors a notifications pipe for
     all the key/keyring changes induced by the tests and they'll all be
     checked off to make sure they happened.

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

 (*) A test program is provided (samples/watch_queue/watch_test) that can
     be used to monitor for keyrings, mount and superblock events.
     Information on the notifications is simply logged to stdout.

Thanks,
David


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [GIT PULL] General notification queue and key notifications
  2020-03-30 14:31 ` [GIT PULL] General notification queue and key notifications David Howells
@ 2020-03-31  6:51   ` Stephen Rothwell
  0 siblings, 0 replies; 22+ messages in thread
From: Stephen Rothwell @ 2020-03-31  6:51 UTC (permalink / raw)
  To: David Howells
  Cc: torvalds, viro, dray, kzak, mszeredi, swhiteho, jlayton, raven,
	andres, christian.brauner, jarkko.sakkinen, keyrings,
	linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2288 bytes --]

Hi David,

On Mon, 30 Mar 2020 15:31:04 +0100 David Howells <dhowells@redhat.com> wrote:
>
>       pipe: Add general notification queue support

This commit has a (reasonably simple) conflict against commit

  6551d5c56eb0 ("pipe: make sure to wake up everybody when the last reader/writer closes")

from Linus' tree.

Also a semantic conflict against commit

  52b31bc9aabc ("io_uring: add splice(2) support")

from the block tree needing this fix up (white space damaged)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index fb8fe0bd5e18..8cdd3870cd4e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2470,7 +2470,7 @@ static int io_splice_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 static bool io_splice_punt(struct file *file)
 {
-	if (get_pipe_info(file))
+	if (get_pipe_info(file, true))
 		return false;
 	if (!io_file_supports_async(file))
 		return true;
>	security: Add hooks to rule on setting a watch
>	security: Add a hook for the point of notification insertion

And these have a conflict against commitinclude/linux/lsm_hooks.h

  98e828a0650f ("security: Refactor declaration of LSM hooks")

from the bpf-next tree (will be in the net-next tree pull).  That
requires taking the net-next version of include/linux/lsm_hooks.h and
then applying the following patch:

diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 9cd4455528e5..4f8d63fd1327 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -252,6 +252,16 @@ LSM_HOOK(int, 0, inode_notifysecctx, struct inode *inode, void *ctx, u32 ctxlen)
 LSM_HOOK(int, 0, inode_setsecctx, struct dentry *dentry, void *ctx, u32 ctxlen)
 LSM_HOOK(int, 0, inode_getsecctx, struct inode *inode, void **ctx,
 	 u32 *ctxlen)
+#ifdef CONFIG_KEY_NOTIFICATIONS
+LSM_HOOK(int, 0, watch_key, struct key *key)
+#endif
+#ifdef CONFIG_DEVICE_NOTIFICATIONS
+LSM_HOOK(int, 0, watch_devices, void)
+#endif
+#ifdef CONFIG_WATCH_QUEUE
+LSM_HOOK(int, 0, post_notification, const struct cred *w_cred,
+	 const struct cred *cred, struct watch_notification *n)
+#endif
 
 #ifdef CONFIG_SECURITY_NETWORK
 LSM_HOOK(int, 0, unix_stream_connect, struct sock *sock, struct sock *other,

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [GIT PULL] General notification queue and key notifications
  2020-03-30 13:58 Upcoming: Notifications, FS notifications and fsinfo() David Howells
@ 2020-03-30 14:31 ` David Howells
  2020-03-31  6:51   ` Stephen Rothwell
  0 siblings, 1 reply; 22+ messages in thread
From: David Howells @ 2020-03-30 14:31 UTC (permalink / raw)
  To: torvalds
  Cc: dhowells, viro, dray, kzak, mszeredi, swhiteho, jlayton, raven,
	andres, christian.brauner, jarkko.sakkinen, keyrings,
	linux-fsdevel, linux-kernel

Hi Linus,

Can you pull this, please?  It adds a general notification queue concept
and adds an event source for keys/keyrings, such as linking and unlinking
keys and changing their attributes.  A subsequent pull request will add
mount and superblock event sources.

LSM hooks are included:

 (1) A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set.  Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable.  The LSM
     should use current's credentials.  [Wanted by SELinux & Smack]

 (2) A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue.  This is given
     the credentials from the event generator (which may be the system) and
     the watch setter.  [Wanted by Smack]

I've provided SELinux and Smack with implementations of some of these hooks.


WHY
===

Key/keyring notifications are desirable because if you have your kerberos
tickets in a file/directory, your Gnome desktop will monitor that using
something like fanotify and tell you if your credentials cache changes.

However, we also have the ability to cache your kerberos tickets in the
session, user or persistent keyring so that it isn't left around on disk
across a reboot or logout.  Keyrings, however, cannot currently be
monitored asynchronously, so the desktop has to poll for it - not so good
on a laptop.

This source will allow the desktop to avoid the need to poll.  Here's a
pull request for usage by gnome-online-accounts:

    https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47


DESIGN DECISIONS
================

 (1) The notification queue is built on top of a standard pipe.  Messages
     are effectively spliced in.  The pipe is opened with a special flag:

	pipe2(fds, O_NOTIFICATION_PIPE);

     The special flag has the same value as O_EXCL (which doesn't seem like
     it will ever be applicable in this context)[?].  It is given up front
     to make it a lot easier to prohibit splice and co. from accessing the
     pipe.

     [?] Should this be done some other way?  I'd rather not use up a new
     	 O_* flag if I can avoid it - should I add a pipe3() system call
     	 instead?

     The pipe is then configured::

	ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
	ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);

     Messages are then read out of the pipe using read().

 (2) It should be possible to allow write() to insert data into the
     notification pipes too, but this is currently disabled as the kernel
     has to be able to insert messages into the pipe *without* holding
     pipe->mutex and the code to make this work needs careful auditing.

 (3) sendfile(), splice() and vmsplice() are disabled on notification pipes
     because of the pipe->mutex issue and also because they sometimes want
     to revert what they just did - but one or more notification messages
     might've been interleaved in the ring.

 (4) The kernel inserts messages with the wait queue spinlock held.  This
     means that pipe_read() and pipe_write() have to take the spinlock to
     update the queue pointers.

 (5) Records in the buffer are binary, typed and have a length so that they
     can be of varying size.

     This allows multiple heterogeneous sources to share a common buffer;
     there are 16 million types available, of which I've used just a few,
     so there is scope for others to be used.  Tags may be specified when a
     watchpoint is created to help distinguish the sources.

 (6) Records are filterable as types have up to 256 subtypes that can be
     individually filtered.  Other filtration is also available.

 (7) Notification pipes don't interfere with each other; each may be bound
     to a different set of watches.  Any particular notification will be
     copied to all the queues that are currently watching for it - and only
     those that are watching for it.

 (8) When recording a notification, the kernel will not sleep, but will
     rather mark a queue as having lost a message if there's insufficient
     space.  read() will fabricate a loss notification message at an
     appropriate point later.

 (9) The notification pipe is created and then watchpoints are attached to
     it, using one of:

	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
	watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
	watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);

     where in both cases, fd indicates the queue and the number after is a
     tag between 0 and 255.

(10) Watches are removed if either the notification pipe is destroyed or
     the watched object is destroyed.  In the latter case, a message will
     be generated indicating the enforced watch removal.


Things I want to avoid:

 (1) Introducing features that make the core VFS dependent on the network
     stack or networking namespaces (ie. usage of netlink).

 (2) Dumping all this stuff into dmesg and having a daemon that sits there
     parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky.  Further, dmesg might not exist or might be
     inaccessible inside a container.

 (3) Letting users see events they shouldn't be able to see.


TESTING AND MANPAGES
====================

 (*) The keyutils tree has a pipe-watch branch that has keyctl commands for
     making use of notifications.  Proposed manual pages can also be found
     on this branch, though a couple of them really need to go to the main
     manpages repository instead.

     If the kernel supports the watching of keys, then running "make test"
     on that branch will cause the testing infrastructure to spawn a
     monitoring process on the side that monitors a notifications pipe for
     all the key/keyring changes induced by the tests and they'll all be
     checked off to make sure they happened.

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch

 (*) A test program is provided (samples/watch_queue/watch_test) that can
     be used to monitor for keyrings, mount and superblock events.
     Information on the notifications is simply logged to stdout.

Thanks,
David
---
The following changes since commit f8788d86ab28f61f7b46eb6be375f8a726783636:

  Linux 5.6-rc3 (2020-02-23 16:17:42 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/notifications-20200330

for you to fetch changes up to 694435dbde3d1da79aafaf4cd680802f9eb229b7:

  smack: Implement the watch_key and post_notification hooks (2020-03-19 17:31:09 +0000)

----------------------------------------------------------------
Notifications over pipes

----------------------------------------------------------------
David Howells (11):
      uapi: General notification queue definitions
      security: Add hooks to rule on setting a watch
      security: Add a hook for the point of notification insertion
      pipe: Add O_NOTIFICATION_PIPE
      pipe: Add general notification queue support
      watch_queue: Add a key/keyring notification facility
      Add sample notification program
      pipe: Allow buffers to be marked read-whole-or-error for notifications
      pipe: Add notification lossage handling
      selinux: Implement the watch_key security hook
      smack: Implement the watch_key and post_notification hooks

 Documentation/security/keys/core.rst               |  57 ++
 Documentation/userspace-api/ioctl/ioctl-number.rst |   1 +
 Documentation/watch_queue.rst                      | 339 +++++++++++
 fs/pipe.c                                          | 242 +++++---
 fs/splice.c                                        |  12 +-
 include/linux/key.h                                |   3 +
 include/linux/lsm_audit.h                          |   1 +
 include/linux/lsm_hooks.h                          |  38 ++
 include/linux/pipe_fs_i.h                          |  27 +-
 include/linux/security.h                           |  31 +
 include/linux/watch_queue.h                        | 127 ++++
 include/uapi/linux/keyctl.h                        |   2 +
 include/uapi/linux/watch_queue.h                   | 104 ++++
 init/Kconfig                                       |  12 +
 kernel/Makefile                                    |   1 +
 kernel/watch_queue.c                               | 659 +++++++++++++++++++++
 samples/Kconfig                                    |   6 +
 samples/Makefile                                   |   1 +
 samples/watch_queue/Makefile                       |   7 +
 samples/watch_queue/watch_test.c                   | 186 ++++++
 security/keys/Kconfig                              |   9 +
 security/keys/compat.c                             |   3 +
 security/keys/gc.c                                 |   5 +
 security/keys/internal.h                           |  30 +-
 security/keys/key.c                                |  38 +-
 security/keys/keyctl.c                             |  99 +++-
 security/keys/keyring.c                            |  20 +-
 security/keys/request_key.c                        |   4 +-
 security/security.c                                |  23 +
 security/selinux/hooks.c                           |  14 +
 security/smack/smack_lsm.c                         |  83 ++-
 31 files changed, 2079 insertions(+), 105 deletions(-)
 create mode 100644 Documentation/watch_queue.rst
 create mode 100644 include/linux/watch_queue.h
 create mode 100644 include/uapi/linux/watch_queue.h
 create mode 100644 kernel/watch_queue.c
 create mode 100644 samples/watch_queue/Makefile
 create mode 100644 samples/watch_queue/watch_test.c


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-06-24  1:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-02 15:55 [GIT PULL] General notification queue and key notifications David Howells
2020-06-03  2:15 ` Ian Kent
2020-06-08  0:49   ` Ian Kent
2020-06-10  9:56 ` Christian Brauner
2020-06-10 11:12 ` Karel Zak
2020-06-12 21:32   ` Linus Torvalds
2020-06-12 22:01   ` Linus Torvalds
2020-06-13 13:04   ` David Howells
2020-06-13 16:47     ` Linus Torvalds
2020-06-13 17:03       ` Linus Torvalds
2020-06-13 19:22     ` Miklos Szeredi
2020-06-13 13:24   ` David Howells
2020-06-13 18:00 ` pr-tracker-bot
2020-06-17  1:15 ` Williams, Dan J
2020-06-23 23:38   ` Dan Williams
2020-06-24  0:55   ` David Howells
2020-06-24  1:03     ` Dan Williams
2020-06-24  1:17     ` David Howells
  -- strict thread matches above, loose matches on Subject: below --
2020-06-02 15:51 David Howells
2020-06-02 15:54 ` David Howells
2020-03-30 13:58 Upcoming: Notifications, FS notifications and fsinfo() David Howells
2020-03-30 14:31 ` [GIT PULL] General notification queue and key notifications David Howells
2020-03-31  6:51   ` Stephen Rothwell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).