linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #8]
@ 2019-09-04 22:15 David Howells
  2019-09-04 22:15 ` [PATCH 01/11] uapi: General notification ring definitions " David Howells
                   ` (12 more replies)
  0 siblings, 13 replies; 36+ messages in thread
From: David Howells @ 2019-09-04 22:15 UTC (permalink / raw)
  To: keyrings, linux-usb, linux-block
  Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley,
	Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner,
	dhowells, linux-security-module, linux-fsdevel, linux-api,
	linux-security-module, linux-kernel


Here's a set of patches to add a general notification queue concept and to
add event sources such as:

 (1) Keys/keyrings, such as linking and unlinking keys and changing their
     attributes.

 (2) General device events (single common queue) including:

     - Block layer events, such as device errors

     - USB subsystem events, such as device attach/remove, device reset,
       device errors.

I have patches for adding superblock and mount topology watches also,
though those are not in this set as there are other dependencies.

Tests for the key/keyring events can be found on the keyutils next branch:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next

Notifications are done automatically inside of the testing infrastructure
on every change to that every test makes to a key or keyring.

Manual pages can be found there also, including pages for the
watch_queue(7) and watch_devices(2) system calls (these should be
transferred to the manpages package if taken upstream) and the
keyctl_watch_key(3) function.

LSM hooks are included:

 (1) A set of hooks are provided that allow an LSM to rule on whether or
     not a watch may be set.  Each of these hooks takes a different
     "watched object" parameter, so they're not really shareable.  The LSM
     should use current's credentials.  [Wanted by SELinux & Smack]

 (2) A hook is provided to allow an LSM to rule on whether or not a
     particular message may be posted to a particular queue.  This is given
     the credentials from the event generator (which may be the system) and
     the watch setter.  [Wanted by Smack]

I've provided SELinux and Smack with implementations of some of these hooks.


Design decisions:

 (1) A misc chardev is used to create and open a ring buffer:

	fd = open("/dev/watch_queue", O_RDWR);

     which is then configured and mmap'd into userspace:

	ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE);
	ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter);
	buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE,
		   MAP_SHARED, fd, 0);

     The fd cannot be read or written and userspace just pulls data out of
     the mapped buffer directly.

 (2) The ring index pointers are exposed to userspace through the buffer.
     Userspace should only update the tail pointer and never the head
     pointer or risk breaking the buffer.  The kernel checks that the
     pointers appear valid before trying to use them.

 (3) The ring pointers are held inside the ring itself at the front inside
     a special 'skip' record.  This means it's not necessary to allocate an
     extra locked page just for them - which would be contributory to the
     locked memory rlimit.

 (3) poll() can be used to wait for data to appear in the buffer.

 (4) Records in the buffer are binary, typed and have a length so that they
     can be of varying size.

     This allows multiple heterogeneous sources to share a common buffer;
     there are 16 million types available, of which I've used just a few,
     so there is scope for others to be used.  Tags may be specified when a
     watchpoint is created to help distinguish the sources.

 (5) Records are filterable as types have up to 256 subtypes that can be
     individually filtered.  Other filtration is also available.

 (6) Each time the buffer is opened, a new buffer is created - this means
     that there's no interference between watchers.

 (7) When recording a notification, the kernel will not sleep, but will
     rather mark a queue as overrun if there's insufficient space, thereby
     avoiding userspace causing the kernel to hang.  This does require the
     buffer to be locked into memory.

 (8) The 'watchpoint' should be specific where possible, meaning that you
     specify the object that you want to watch.

 (9) The buffer is created and then watchpoints are attached to it, using
     one of:

	keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01);
	watch_devices(fd, 0x02, 0);

     where in both cases, fd indicates the queue and the number after is a
     tag between 0 and 255.

(10) Watches are removed if either the watch buffer is destroyed or the
     watched object is destroyed.


Things I want to avoid:

 (1) Introducing features that make the core VFS dependent on the network
     stack or networking namespaces (ie. usage of netlink).

 (2) Dumping all this stuff into dmesg and having a daemon that sits there
     parsing the output and distributing it as this then puts the
     responsibility for security into userspace and makes handling
     namespaces tricky.  Further, dmesg might not exist or might be
     inaccessible inside a container.

 (3) Letting users see events they shouldn't be able to see.


The patches can be found here also:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core

Changes:

 ver #8:

 (*) Added comments on the kernel-side memory barriers for the ring buffer.

 (*) Reworked the filter check function to remove the hard-coded numbers.

 (*) Removed the USB bus notifications for now as there was a bug in there
     causing a crash.

 (*) Added syscall hooks for arm64.

 ver #7:

 (*) Removed the 'watch' argument from the security_watch_key() and
     security_watch_devices() hooks as current_cred() can be used instead
     of watch->cred.

 ver #6:

 (*) Fix mmap bug in watch_queue driver.

 (*) Add an extended removal notification that can transmit an identifier
     to userspace (such as a key ID).

 (*) Don't produce a instantiation notification in mark_key_instantiated()
     but rather do it in the caller to prevent key updates from producing
     an instantiate notification as well as an update notification.

 (*) Set the right number of filters in the sample program.

 (*) Provide preliminary hook implementations for SELinux and Smack.

 ver #5:

 (*) Split the superblock watch and mount watch parts out into their own
     branch (notifications-mount) as they really need certain fsinfo()
     attributes.

 (*) Rearrange the watch notification UAPI header to push the length down
     to bits 0-5 and remove the lost-message bits.  The userspace's watch
     ID tag is moved to bits 8-15 and then the message type is allocated
     all of bits 16-31 for its own purposes.

     The lost-message bit is moved over to the header, rather than being
     placed in the next message to be generated and given its own word so
     it can be cleared with xchg(,0) for parisc.

 (*) The security_post_notification() hook is no longer called with the
     spinlock held and softirqs disabled - though the RCU readlock is still
     held.

 (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK
     will skip the overuse check.

 (*) The buffer is marked VM_DONTEXPAND.

 (*) Save the watch-setter's creds in struct watch and give that to the LSM
     hook for posting a message.

 ver #4:

 (*) Split the basic UAPI bits out into their own patch and then split the
     LSM hooks out into an intermediate patch.  Add LSM hooks for setting
     watches.

     Rename the *_notify() system calls to watch_*() for consistency.

 ver #3:

 (*) I've added a USB notification source and reformulated the block
     notification source so that there's now a common watch list, for which
     the system call is now device_notify().

     I've assigned a pair of unused ioctl numbers in the 'W' series to the
     ioctls added by this series.

     I've also added a description of the kernel API to the documentation.

 ver #2:

 (*) I've fixed various issues raised by Jann Horn and GregKH and moved to
     krefs for refcounting.  I've added some security features to try and
     give Casey Schaufler the LSM control he wants.

David
---
David Howells (11):
      uapi: General notification ring definitions
      security: Add hooks to rule on setting a watch
      security: Add a hook for the point of notification insertion
      General notification queue with user mmap()'able ring buffer
      keys: Add a notification facility
      Add a general, global device notification watch list
      block: Add block layer notifications
      usb: Add USB subsystem notifications
      Add sample notification program
      selinux: Implement the watch_key security hook
      smack: Implement the watch_key and post_notification hooks


 Documentation/ioctl/ioctl-number.rst        |    1 
 Documentation/security/keys/core.rst        |   58 ++
 Documentation/watch_queue.rst               |  460 ++++++++++++++
 arch/alpha/kernel/syscalls/syscall.tbl      |    1 
 arch/arm/tools/syscall.tbl                  |    1 
 arch/arm64/include/asm/unistd.h             |    2 
 arch/arm64/include/asm/unistd32.h           |    2 
 arch/ia64/kernel/syscalls/syscall.tbl       |    1 
 arch/m68k/kernel/syscalls/syscall.tbl       |    1 
 arch/microblaze/kernel/syscalls/syscall.tbl |    1 
 arch/mips/kernel/syscalls/syscall_n32.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_n64.tbl   |    1 
 arch/mips/kernel/syscalls/syscall_o32.tbl   |    1 
 arch/parisc/kernel/syscalls/syscall.tbl     |    1 
 arch/powerpc/kernel/syscalls/syscall.tbl    |    1 
 arch/s390/kernel/syscalls/syscall.tbl       |    1 
 arch/sh/kernel/syscalls/syscall.tbl         |    1 
 arch/sparc/kernel/syscalls/syscall.tbl      |    1 
 arch/x86/entry/syscalls/syscall_32.tbl      |    1 
 arch/x86/entry/syscalls/syscall_64.tbl      |    1 
 arch/xtensa/kernel/syscalls/syscall.tbl     |    1 
 block/Kconfig                               |    9 
 block/blk-core.c                            |   29 +
 drivers/base/Kconfig                        |    9 
 drivers/base/Makefile                       |    1 
 drivers/base/watch.c                        |   90 +++
 drivers/misc/Kconfig                        |   13 
 drivers/misc/Makefile                       |    1 
 drivers/misc/watch_queue.c                  |  898 +++++++++++++++++++++++++++
 drivers/usb/core/Kconfig                    |    9 
 drivers/usb/core/devio.c                    |   49 +
 drivers/usb/core/hub.c                      |    4 
 include/linux/blkdev.h                      |   15 
 include/linux/device.h                      |    7 
 include/linux/key.h                         |    3 
 include/linux/lsm_audit.h                   |    1 
 include/linux/lsm_hooks.h                   |   38 +
 include/linux/sched/user.h                  |    3 
 include/linux/security.h                    |   32 +
 include/linux/syscalls.h                    |    1 
 include/linux/usb.h                         |   18 +
 include/linux/watch_queue.h                 |   94 +++
 include/uapi/asm-generic/unistd.h           |    4 
 include/uapi/linux/keyctl.h                 |    2 
 include/uapi/linux/watch_queue.h            |  181 +++++
 kernel/sys_ni.c                             |    1 
 samples/Kconfig                             |    6 
 samples/Makefile                            |    1 
 samples/watch_queue/Makefile                |    8 
 samples/watch_queue/watch_test.c            |  231 +++++++
 security/keys/Kconfig                       |    9 
 security/keys/compat.c                      |    3 
 security/keys/gc.c                          |    5 
 security/keys/internal.h                    |   30 +
 security/keys/key.c                         |   38 +
 security/keys/keyctl.c                      |   99 +++
 security/keys/keyring.c                     |   20 -
 security/keys/request_key.c                 |    4 
 security/security.c                         |   23 +
 security/selinux/hooks.c                    |   14 
 security/smack/smack_lsm.c                  |   82 ++
 61 files changed, 2592 insertions(+), 32 deletions(-)
 create mode 100644 Documentation/watch_queue.rst
 create mode 100644 drivers/base/watch.c
 create mode 100644 drivers/misc/watch_queue.c
 create mode 100644 include/linux/watch_queue.h
 create mode 100644 include/uapi/linux/watch_queue.h
 create mode 100644 samples/watch_queue/Makefile
 create mode 100644 samples/watch_queue/watch_test.c


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-09-06 21:19 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-04 22:15 [PATCH 00/11] Keyrings, Block and USB notifications [ver #8] David Howells
2019-09-04 22:15 ` [PATCH 01/11] uapi: General notification ring definitions " David Howells
2019-09-04 22:16 ` [PATCH 02/11] security: Add hooks to rule on setting a watch " David Howells
2019-09-04 22:16 ` [PATCH 03/11] security: Add a hook for the point of notification insertion " David Howells
2019-09-04 22:16 ` [PATCH 04/11] General notification queue with user mmap()'able ring buffer " David Howells
2019-09-04 22:16 ` [PATCH 05/11] keys: Add a notification facility " David Howells
2019-09-04 22:16 ` [PATCH 06/11] Add a general, global device notification watch list " David Howells
2019-09-04 22:16 ` [PATCH 07/11] block: Add block layer notifications " David Howells
2019-09-04 22:16 ` [PATCH 08/11] usb: Add USB subsystem " David Howells
2019-09-04 22:17 ` [PATCH 09/11] Add sample notification program " David Howells
2019-09-04 22:17 ` [PATCH 10/11] selinux: Implement the watch_key security hook " David Howells
2019-09-04 22:17 ` [PATCH 11/11] smack: Implement the watch_key and post_notification hooks " David Howells
2019-09-04 22:28 ` [PATCH 00/11] Keyrings, Block and USB notifications " Linus Torvalds
2019-09-05 17:01 ` Why add the general notification queue and its sources David Howells
2019-09-05 17:19   ` Linus Torvalds
2019-09-05 18:32     ` Ray Strode
2019-09-05 20:39       ` Linus Torvalds
2019-09-06 19:32         ` Ray Strode
2019-09-06 19:41           ` Ray Strode
2019-09-06 19:53           ` Robbie Harwood
2019-09-05 21:32       ` David Howells
2019-09-05 22:08         ` Linus Torvalds
2019-09-05 23:18         ` David Howells
2019-09-06  0:07           ` Linus Torvalds
2019-09-06 10:09           ` David Howells
2019-09-06 15:35             ` Linus Torvalds
2019-09-06 15:53               ` Linus Torvalds
2019-09-06 16:12                 ` Steven Whitehouse
2019-09-06 17:07                   ` Linus Torvalds
2019-09-06 17:14                     ` Linus Torvalds
2019-09-06 21:19                       ` David Howells
2019-09-06 17:14                   ` Andy Lutomirski
2019-09-05 18:37     ` Steven Whitehouse
2019-09-05 18:51       ` Ray Strode
2019-09-05 20:09         ` David Lehman
2019-09-05 18:33   ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).