* [PATCH 00/11] Keyrings, Block and USB notifications [ver #8] 2019-08-29 18:29 ` David Howells (?) @ 2019-09-04 22:15 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:15 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Here's a set of patches to add a general notification queue concept and to add event sources such as: (1) Keys/keyrings, such as linking and unlinking keys and changing their attributes. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device attach/remove, device reset, device errors. I have patches for adding superblock and mount topology watches also, though those are not in this set as there are other dependencies. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for the watch_queue(7) and watch_devices(2) system calls (these should be transferred to the manpages package if taken upstream) and the keyctl_watch_key(3) function. LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written and userspace just pulls data out of the mapped buffer directly. (2) The ring index pointers are exposed to userspace through the buffer. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. (3) The ring pointers are held inside the ring itself at the front inside a special 'skip' record. This means it's not necessary to allocate an extra locked page just for them - which would be contributory to the locked memory rlimit. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This allows multiple heterogeneous sources to share a common buffer; there are 16 million types available, of which I've used just a few, so there is scope for others to be used. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (6) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (7) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. This does require the buffer to be locked into memory. (8) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (9) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (10) Watches are removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #8: (*) Added comments on the kernel-side memory barriers for the ring buffer. (*) Reworked the filter check function to remove the hard-coded numbers. (*) Removed the USB bus notifications for now as there was a bug in there causing a crash. (*) Added syscall hooks for arm64. ver #7: (*) Removed the 'watch' argument from the security_watch_key() and security_watch_devices() hooks as current_cred() can be used instead of watch->cred. ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 2 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 898 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 49 + drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 38 + include/linux/sched/user.h | 3 include/linux/security.h | 32 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 181 +++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 231 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 99 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 23 + security/selinux/hooks.c | 14 security/smack/smack_lsm.c | 82 ++ 61 files changed, 2592 insertions(+), 32 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #8] @ 2019-09-04 22:15 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:15 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Here's a set of patches to add a general notification queue concept and to add event sources such as: (1) Keys/keyrings, such as linking and unlinking keys and changing their attributes. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device attach/remove, device reset, device errors. I have patches for adding superblock and mount topology watches also, though those are not in this set as there are other dependencies. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for the watch_queue(7) and watch_devices(2) system calls (these should be transferred to the manpages package if taken upstream) and the keyctl_watch_key(3) function. LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written and userspace just pulls data out of the mapped buffer directly. (2) The ring index pointers are exposed to userspace through the buffer. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. (3) The ring pointers are held inside the ring itself at the front inside a special 'skip' record. This means it's not necessary to allocate an extra locked page just for them - which would be contributory to the locked memory rlimit. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This allows multiple heterogeneous sources to share a common buffer; there are 16 million types available, of which I've used just a few, so there is scope for others to be used. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (6) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (7) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. This does require the buffer to be locked into memory. (8) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (9) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (10) Watches are removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #8: (*) Added comments on the kernel-side memory barriers for the ring buffer. (*) Reworked the filter check function to remove the hard-coded numbers. (*) Removed the USB bus notifications for now as there was a bug in there causing a crash. (*) Added syscall hooks for arm64. ver #7: (*) Removed the 'watch' argument from the security_watch_key() and security_watch_devices() hooks as current_cred() can be used instead of watch->cred. ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 2 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 898 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 49 + drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 38 + include/linux/sched/user.h | 3 include/linux/security.h | 32 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 181 +++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 231 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 99 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 23 + security/selinux/hooks.c | 14 security/smack/smack_lsm.c | 82 ++ 61 files changed, 2592 insertions(+), 32 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #8] @ 2019-09-04 22:15 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:15 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Here's a set of patches to add a general notification queue concept and to add event sources such as: (1) Keys/keyrings, such as linking and unlinking keys and changing their attributes. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device attach/remove, device reset, device errors. I have patches for adding superblock and mount topology watches also, though those are not in this set as there are other dependencies. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for the watch_queue(7) and watch_devices(2) system calls (these should be transferred to the manpages package if taken upstream) and the keyctl_watch_key(3) function. LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written and userspace just pulls data out of the mapped buffer directly. (2) The ring index pointers are exposed to userspace through the buffer. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. (3) The ring pointers are held inside the ring itself at the front inside a special 'skip' record. This means it's not necessary to allocate an extra locked page just for them - which would be contributory to the locked memory rlimit. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This allows multiple heterogeneous sources to share a common buffer; there are 16 million types available, of which I've used just a few, so there is scope for others to be used. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (6) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (7) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. This does require the buffer to be locked into memory. (8) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (9) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (10) Watches are removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #8: (*) Added comments on the kernel-side memory barriers for the ring buffer. (*) Reworked the filter check function to remove the hard-coded numbers. (*) Removed the USB bus notifications for now as there was a bug in there causing a crash. (*) Added syscall hooks for arm64. ver #7: (*) Removed the 'watch' argument from the security_watch_key() and security_watch_devices() hooks as current_cred() can be used instead of watch->cred. ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 arch/arm64/include/asm/unistd32.h | 2 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 898 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 49 + drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 38 + include/linux/sched/user.h | 3 include/linux/security.h | 32 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 181 +++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 231 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 99 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 23 + security/selinux/hooks.c | 14 security/smack/smack_lsm.c | 82 ++ 61 files changed, 2592 insertions(+), 32 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:15 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:15 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #8] @ 2019-09-04 22:15 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:15 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #8] @ 2019-09-04 22:15 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:15 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:16 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 24 ++++++++++++++++++++++++ include/linux/security.h | 17 +++++++++++++++++ security/security.c | 14 ++++++++++++++ 3 files changed, 55 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..b0cdefcda4e6 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,18 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1700,12 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_KEY_NOTIFICATIONS + int (*watch_key)(struct key *key); +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + int (*watch_devices)(void); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,12 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct hlist_head watch_key; +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + struct hlist_head watch_devices; +#endif #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..3be44354d308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1206,6 +1206,23 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 } #endif /* CONFIG_SECURITY */ +#if defined(CONFIG_SECURITY) && defined(CONFIG_KEY_NOTIFICATIONS) +int security_watch_key(struct key *key); +#else +static inline int security_watch_key(struct key *key) +{ + return 0; +} +#endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_DEVICE_NOTIFICATIONS) +int security_watch_devices(void); +#else +static inline int security_watch_devices(void) +{ + return 0; +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk); diff --git a/security/security.c b/security/security.c index 250ee2d76406..007eb48bc848 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,20 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_KEY_NOTIFICATIONS +int security_watch_key(struct key *key) +{ + return call_int_hook(watch_key, 0, key); +} +#endif + +#ifdef CONFIG_DEVICE_NOTIFICATIONS +int security_watch_devices(void) +{ + return call_int_hook(watch_devices, 0); +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 24 ++++++++++++++++++++++++ include/linux/security.h | 17 +++++++++++++++++ security/security.c | 14 ++++++++++++++ 3 files changed, 55 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..b0cdefcda4e6 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,18 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1700,12 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_KEY_NOTIFICATIONS + int (*watch_key)(struct key *key); +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + int (*watch_devices)(void); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,12 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct hlist_head watch_key; +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + struct hlist_head watch_devices; +#endif #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..3be44354d308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1206,6 +1206,23 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 } #endif /* CONFIG_SECURITY */ +#if defined(CONFIG_SECURITY) && defined(CONFIG_KEY_NOTIFICATIONS) +int security_watch_key(struct key *key); +#else +static inline int security_watch_key(struct key *key) +{ + return 0; +} +#endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_DEVICE_NOTIFICATIONS) +int security_watch_devices(void); +#else +static inline int security_watch_devices(void) +{ + return 0; +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk); diff --git a/security/security.c b/security/security.c index 250ee2d76406..007eb48bc848 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,20 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_KEY_NOTIFICATIONS +int security_watch_key(struct key *key) +{ + return call_int_hook(watch_key, 0, key); +} +#endif + +#ifdef CONFIG_DEVICE_NOTIFICATIONS +int security_watch_devices(void) +{ + return call_int_hook(watch_devices, 0); +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 24 ++++++++++++++++++++++++ include/linux/security.h | 17 +++++++++++++++++ security/security.c | 14 ++++++++++++++ 3 files changed, 55 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..b0cdefcda4e6 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,18 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1700,12 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_KEY_NOTIFICATIONS + int (*watch_key)(struct key *key); +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + int (*watch_devices)(void); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,12 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct hlist_head watch_key; +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + struct hlist_head watch_devices; +#endif #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..3be44354d308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1206,6 +1206,23 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 } #endif /* CONFIG_SECURITY */ +#if defined(CONFIG_SECURITY) && defined(CONFIG_KEY_NOTIFICATIONS) +int security_watch_key(struct key *key); +#else +static inline int security_watch_key(struct key *key) +{ + return 0; +} +#endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_DEVICE_NOTIFICATIONS) +int security_watch_devices(void); +#else +static inline int security_watch_devices(void) +{ + return 0; +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk); diff --git a/security/security.c b/security/security.c index 250ee2d76406..007eb48bc848 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,20 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_KEY_NOTIFICATIONS +int security_watch_key(struct key *key) +{ + return call_int_hook(watch_key, 0, key); +} +#endif + +#ifdef CONFIG_DEVICE_NOTIFICATIONS +int security_watch_devices(void) +{ + return call_int_hook(watch_devices, 0); +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:16 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 14 ++++++++++++++ include/linux/security.h | 15 ++++++++++++++- security/security.c | 9 +++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index b0cdefcda4e6..257d803dcf6f 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1424,6 +1424,12 @@ * Check to see if a process is allowed to watch for event notifications * from devices (as a global set). * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1706,6 +1712,11 @@ union security_list_options { #ifdef CONFIG_DEVICE_NOTIFICATIONS int (*watch_devices)(void); #endif +#ifdef CONFIG_WATCH_QUEUE + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1988,6 +1999,9 @@ struct security_hook_heads { #ifdef CONFIG_DEVICE_NOTIFICATIONS struct hlist_head watch_devices; #endif +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head post_notification; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 3be44354d308..24c54b9ff0a1 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,8 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -1222,6 +1224,18 @@ static inline int security_watch_devices(void) return 0; } #endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE) +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#else +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} +#endif #ifdef CONFIG_SECURITY_NETWORK @@ -1847,4 +1861,3 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_BPF_SYSCALL */ #endif /* ! __LINUX_SECURITY_H */ - diff --git a/security/security.c b/security/security.c index 007eb48bc848..b719c5a5b2ba 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,15 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_KEY_NOTIFICATIONS int security_watch_key(struct key *key) { ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 14 ++++++++++++++ include/linux/security.h | 15 ++++++++++++++- security/security.c | 9 +++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index b0cdefcda4e6..257d803dcf6f 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1424,6 +1424,12 @@ * Check to see if a process is allowed to watch for event notifications * from devices (as a global set). * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1706,6 +1712,11 @@ union security_list_options { #ifdef CONFIG_DEVICE_NOTIFICATIONS int (*watch_devices)(void); #endif +#ifdef CONFIG_WATCH_QUEUE + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1988,6 +1999,9 @@ struct security_hook_heads { #ifdef CONFIG_DEVICE_NOTIFICATIONS struct hlist_head watch_devices; #endif +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head post_notification; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 3be44354d308..24c54b9ff0a1 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,8 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -1222,6 +1224,18 @@ static inline int security_watch_devices(void) return 0; } #endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE) +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#else +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} +#endif #ifdef CONFIG_SECURITY_NETWORK @@ -1847,4 +1861,3 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_BPF_SYSCALL */ #endif /* ! __LINUX_SECURITY_H */ - diff --git a/security/security.c b/security/security.c index 007eb48bc848..b719c5a5b2ba 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,15 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_KEY_NOTIFICATIONS int security_watch_key(struct key *key) { ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 14 ++++++++++++++ include/linux/security.h | 15 ++++++++++++++- security/security.c | 9 +++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index b0cdefcda4e6..257d803dcf6f 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1424,6 +1424,12 @@ * Check to see if a process is allowed to watch for event notifications * from devices (as a global set). * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1706,6 +1712,11 @@ union security_list_options { #ifdef CONFIG_DEVICE_NOTIFICATIONS int (*watch_devices)(void); #endif +#ifdef CONFIG_WATCH_QUEUE + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1988,6 +1999,9 @@ struct security_hook_heads { #ifdef CONFIG_DEVICE_NOTIFICATIONS struct hlist_head watch_devices; #endif +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head post_notification; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 3be44354d308..24c54b9ff0a1 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,8 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -1222,6 +1224,18 @@ static inline int security_watch_devices(void) return 0; } #endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE) +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#else +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} +#endif #ifdef CONFIG_SECURITY_NETWORK @@ -1847,4 +1861,3 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_BPF_SYSCALL */ #endif /* ! __LINUX_SECURITY_H */ - diff --git a/security/security.c b/security/security.c index 007eb48bc848..b719c5a5b2ba 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,15 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_KEY_NOTIFICATIONS int security_watch_key(struct key *key) { ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:16 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 898 ++++++++++++++++++++++++++++++++++ include/linux/sched/user.h | 3 include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 8 files changed, 1472 insertions(+), 1 deletion(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============================ +Mappable notifications queue +============================ + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +======== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +================ + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +============== + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" == "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +==================================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +===================================== + +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +====================== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +======================== + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +============= + +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=============== + +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) == info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +================== + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +====================== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len == 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..b3fc59b4ef6c --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,898 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + /* Barrier against userspace, ordering data read before tail read */ + ring_tail = READ_ONCE(buf->meta.tail); + + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h == 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h == tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + /* Barrier against userspace, ordering head update after data write. */ + smp_store_release(&buf->meta.head, head); + if (used == 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + unsigned int st_bits = sizeof(wt->subtype_filter[0]) * 8; + unsigned int st_index = n->subtype / st_bits; + unsigned int st_bit = 1U << (n->subtype % st_bits); + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type == wt->type && + (wt->subtype_filter[st_index] & st_bit) && + (n->info & wt->info_mask) == wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) == 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages == 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages == 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters == 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue = + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue == wq && watch->id == w->id) + return -EBUSY; + } + + watch->cred = get_current_cred(); + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id == id && rcu_access_pointer(watch->queue) == wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op == &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index 917d88edb7b9..126494d917bf 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -33,7 +33,8 @@ struct user_struct { kuid_t uid; #if defined(CONFIG_PERF_EVENTS) || defined(CONFIG_BPF_SYSCALL) || \ - defined(CONFIG_NET) || defined(CONFIG_IO_URING) + defined(CONFIG_NET) || defined(CONFIG_IO_URING) || \ + defined(CONFIG_WATCH_QUEUE) atomic_long_t locked_vm; #endif diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 898 ++++++++++++++++++++++++++++++++++ include/linux/sched/user.h | 3 include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 8 files changed, 1472 insertions(+), 1 deletion(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============================ +Mappable notifications queue +============================ + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +======== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +================ + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +============== + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" == "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +==================================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +===================================== + +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +====================== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +======================== + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +============= + +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=============== + +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) == info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +================== + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +====================== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len == 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..b3fc59b4ef6c --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,898 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + /* Barrier against userspace, ordering data read before tail read */ + ring_tail = READ_ONCE(buf->meta.tail); + + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h == 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h == tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + /* Barrier against userspace, ordering head update after data write. */ + smp_store_release(&buf->meta.head, head); + if (used == 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + unsigned int st_bits = sizeof(wt->subtype_filter[0]) * 8; + unsigned int st_index = n->subtype / st_bits; + unsigned int st_bit = 1U << (n->subtype % st_bits); + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type == wt->type && + (wt->subtype_filter[st_index] & st_bit) && + (n->info & wt->info_mask) == wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) == 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages == 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages == 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters == 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue = + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue == wq && watch->id == w->id) + return -EBUSY; + } + + watch->cred = get_current_cred(); + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id == id && rcu_access_pointer(watch->queue) == wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op == &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index 917d88edb7b9..126494d917bf 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -33,7 +33,8 @@ struct user_struct { kuid_t uid; #if defined(CONFIG_PERF_EVENTS) || defined(CONFIG_BPF_SYSCALL) || \ - defined(CONFIG_NET) || defined(CONFIG_IO_URING) + defined(CONFIG_NET) || defined(CONFIG_IO_URING) || \ + defined(CONFIG_WATCH_QUEUE) atomic_long_t locked_vm; #endif diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 898 ++++++++++++++++++++++++++++++++++ include/linux/sched/user.h | 3 include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 8 files changed, 1472 insertions(+), 1 deletion(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============== +Mappable notifications queue +============== + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +==== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +======== + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +======= + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" = "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +==================+ +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +=========== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +============ + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +======+ +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=======+ +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) = info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +========= + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +=========== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) = -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len = 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..b3fc59b4ef6c --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,898 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + /* Barrier against userspace, ordering data read before tail read */ + ring_tail = READ_ONCE(buf->meta.tail); + + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h = 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h = tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + /* Barrier against userspace, ordering head update after data write. */ + smp_store_release(&buf->meta.head, head); + if (used = 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + unsigned int st_bits = sizeof(wt->subtype_filter[0]) * 8; + unsigned int st_index = n->subtype / st_bits; + unsigned int st_bit = 1U << (n->subtype % st_bits); + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type = wt->type && + (wt->subtype_filter[st_index] & st_bit) && + (n->info & wt->info_mask) = wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) = 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages = 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages = 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters = 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue = wq && watch->id = w->id) + return -EBUSY; + } + + watch->cred = get_current_cred(); + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id = id && rcu_access_pointer(watch->queue) = wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op = &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index 917d88edb7b9..126494d917bf 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -33,7 +33,8 @@ struct user_struct { kuid_t uid; #if defined(CONFIG_PERF_EVENTS) || defined(CONFIG_BPF_SYSCALL) || \ - defined(CONFIG_NET) || defined(CONFIG_IO_URING) + defined(CONFIG_NET) || defined(CONFIG_IO_URING) || \ + defined(CONFIG_WATCH_QUEUE) atomic_long_t locked_vm; #endif diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:16 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 ++++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++- security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 99 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 271 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services =============== diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state == KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at == 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at == 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..6610649514fb 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,90 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_key(key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret == 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + kfree(watch); +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1947,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret == 0) ret = __key_link_check_live_key(keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 ++++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++- security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 99 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 271 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services =============== diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state == KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at == 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at == 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..6610649514fb 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,90 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_key(key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret == 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + kfree(watch); +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1947,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret == 0) ret = __key_link_check_live_key(keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 ++++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++- security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 99 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 271 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services ======= diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state = KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret = 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret = 0) + if (ret = 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret = 0) + if (ret = 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at = 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at = 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..6610649514fb 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,90 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_key(key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret = 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + kfree(watch); +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1947,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret = 0) ret = __key_link_check_live_key(keyring, key); if (ret = 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret = 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:16 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++- arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 - arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 26 files changed, 152 insertions(+), 3 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +======================== + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ============= @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 2629a68b8724..368761302768 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 436 +#define __NR_compat_syscalls 437 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 94ab29cf4f00..b5310789ce7a 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick) __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..725aaa24275b --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_devices(); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret == 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + kfree(watch); +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++- arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 - arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 26 files changed, 152 insertions(+), 3 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +======================== + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ============= @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 2629a68b8724..368761302768 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 436 +#define __NR_compat_syscalls 437 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 94ab29cf4f00..b5310789ce7a 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick) __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..725aaa24275b --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_devices(); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret == 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + kfree(watch); +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++- arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/arm64/include/asm/unistd.h | 2 - arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 26 files changed, 152 insertions(+), 3 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +============ + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ====== @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 2629a68b8724..368761302768 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 436 +#define __NR_compat_syscalls 437 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 94ab29cf4f00..b5310789ce7a 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -879,6 +879,8 @@ __SYSCALL(__NR_fspick, sys_fspick) __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..725aaa24275b --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_devices(); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret = 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + kfree(watch); +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:16 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:16 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 +++++++ drivers/usb/core/Kconfig | 9 +++++++ drivers/usb/core/devio.c | 49 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++++ include/uapi/linux/watch_queue.h | 28 +++++++++++++++++++++- 6 files changed, 116 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =============== diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..21c07d76f7d7 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,61 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..baa4b3ead006 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,29 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_DEVICE_RESET = 2, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 3, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 +++++++ drivers/usb/core/Kconfig | 9 +++++++ drivers/usb/core/devio.c | 49 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++++ include/uapi/linux/watch_queue.h | 28 +++++++++++++++++++++- 6 files changed, 116 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =============== diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..21c07d76f7d7 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,61 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..baa4b3ead006 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,29 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_DEVICE_RESET = 2, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 3, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #8] @ 2019-09-04 22:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:16 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 +++++++ drivers/usb/core/Kconfig | 9 +++++++ drivers/usb/core/devio.c | 49 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++++ include/uapi/linux/watch_queue.h | 28 +++++++++++++++++++++- 6 files changed, 116 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =======diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..21c07d76f7d7 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,61 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..baa4b3ead006 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,29 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_DEVICE_RESET = 2, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 3, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:17 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m=20003 NOTIFY[00000004-00000002] ty=0003 sy=0002 i=01000010 KEY 2ffc2e5d change=2[linked] aux=1035096409 ptrs h=6 t=4 m=20003 NOTIFY[00000006-00000004] ty=0003 sy=0003 i=01000010 KEY 2ffc2e5d change=3[unlinked] aux=1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m=6000004 NOTIFY[00000005-00000002] ty=0004 sy=0006 i=04000018 BLOCK 00800050 e=6[critical medium] s=5be8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 231 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 246 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..ecc191174b7a --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,231 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) == 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype == WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd == -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) == -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) == -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) == -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) == -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #8] @ 2019-09-04 22:17 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m=20003 NOTIFY[00000004-00000002] ty=0003 sy=0002 i=01000010 KEY 2ffc2e5d change=2[linked] aux=1035096409 ptrs h=6 t=4 m=20003 NOTIFY[00000006-00000004] ty=0003 sy=0003 i=01000010 KEY 2ffc2e5d change=3[unlinked] aux=1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m=6000004 NOTIFY[00000005-00000002] ty=0004 sy=0006 i=04000018 BLOCK 00800050 e=6[critical medium] s=5be8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 231 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 246 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..ecc191174b7a --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,231 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) == 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype == WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd == -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) == -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) == -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) == -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) == -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #8] @ 2019-09-04 22:17 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m 003 NOTIFY[00000004-00000002] ty\003 sy\002 i\x01000010 KEY 2ffc2e5d change=2[linked] aux\x1035096409 ptrs h=6 t=4 m 003 NOTIFY[00000006-00000004] ty\003 sy\003 i\x01000010 KEY 2ffc2e5d change=3[unlinked] aux\x1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m`00004 NOTIFY[00000005-00000002] ty\004 sy\006 i\x04000018 BLOCK 00800050 e=6[critical medium] s[e8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 231 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 246 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..ecc191174b7a --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,231 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) = -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) = 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype = WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd = -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) = -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) = -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf = MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) = -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) = -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:17 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> --- security/selinux/hooks.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..88df06969bed 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct key *key) +{ + struct key_security_struct *ksec = key->security; + u32 sid = current_sid(); + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #8] @ 2019-09-04 22:17 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> --- security/selinux/hooks.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..88df06969bed 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct key *key) +{ + struct key_security_struct *ksec = key->security; + u32 sid = current_sid(); + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #8] @ 2019-09-04 22:17 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> --- security/selinux/hooks.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..88df06969bed 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct key *key) +{ + struct key_security_struct *ksec = key->security; + u32 sid = current_sid(); + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [ver #8] 2019-09-04 22:15 ` David Howells (?) @ 2019-09-04 22:17 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, linux-security-module, linux-fsdevel, linux-api, linux-security-module, linux-kernel Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 82 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 82 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..1c2a908c6446 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4274,7 +4274,7 @@ static int smack_key_permission(key_ref_t key_ref, if (tkp == NULL) return -EACCES; - if (smack_privileged_cred(CAP_MAC_OVERRIDE, cred)) + if (smack_privileged(CAP_MAC_OVERRIDE)) return 0; #ifdef CONFIG_AUDIT @@ -4320,8 +4320,81 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_current(); + int rc; + + if (key == NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security == NULL) + return 0; + /* + * This should not occur + */ + if (tkp == NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred())) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type == WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4783,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [ver #8] @ 2019-09-04 22:17 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 82 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 82 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..1c2a908c6446 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4274,7 +4274,7 @@ static int smack_key_permission(key_ref_t key_ref, if (tkp == NULL) return -EACCES; - if (smack_privileged_cred(CAP_MAC_OVERRIDE, cred)) + if (smack_privileged(CAP_MAC_OVERRIDE)) return 0; #ifdef CONFIG_AUDIT @@ -4320,8 +4320,81 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_current(); + int rc; + + if (key == NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security == NULL) + return 0; + /* + * This should not occur + */ + if (tkp == NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred())) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type == WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4783,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [ver #8] @ 2019-09-04 22:17 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 22:17 UTC (permalink / raw) To: keyrings, linux-usb, linux-block Cc: dhowells, torvalds, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 82 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 82 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..1c2a908c6446 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4274,7 +4274,7 @@ static int smack_key_permission(key_ref_t key_ref, if (tkp = NULL) return -EACCES; - if (smack_privileged_cred(CAP_MAC_OVERRIDE, cred)) + if (smack_privileged(CAP_MAC_OVERRIDE)) return 0; #ifdef CONFIG_AUDIT @@ -4320,8 +4320,81 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_current(); + int rc; + + if (key = NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security = NULL) + return 0; + /* + * This should not occur + */ + if (tkp = NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred())) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type = WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4783,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
* Re: [PATCH 00/11] Keyrings, Block and USB notifications [ver #8] 2019-09-04 22:15 ` David Howells @ 2019-09-04 22:28 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-04 22:28 UTC (permalink / raw) To: David Howells Cc: keyrings, linux-usb, linux-block, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing On Wed, Sep 4, 2019 at 3:15 PM David Howells <dhowells@redhat.com> wrote: > > > Here's a set of patches to add a general notification queue concept and to > add event sources such as: Why? I'm just going to be very blunt about this, and say that there is no way I can merge any of this *ever*, unless other people stand up and say that (a) they'll use it and (b) they'll actively develop it and participate in testing and coding Because I'm simply not willing to have the same situation that happened with the keyring ACL stuff this merge window happen with some other random feature some day in the future. That change never had anybody else that showed any interest in it, it was never really clear why it was made, and it broke booting for me. That had better never happen again, and I'm tired of seeing unexplained random changes to key handling that have one single author and nobody else involved. And there is this whole long cover letter to explain what the code does, what you can do with it, and what the changes have been in revisions, but AT NO POINT does it explain what the point of the feature is at all. Why would we want this, and what is the advantage over udev etc that already has event handling for things like block events and USB events? What's the advantage of another random character device, and what's the use? Who is asking for this, and who would use it? Why are keys special, and why should you be able to track events on keys in the first place? Who is co-developing and testing this, and what's the point? Fundamentally, I'm not even interested in seeing "Reviewed-by". New features need actual users and explanations for what they are, over and beyond the developer itself. IOW, you need to have an outside person step in and say "yes, I need this". No more of these "David makes random changes without any external input" series. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 00/11] Keyrings, Block and USB notifications [ver #8] @ 2019-09-04 22:28 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-04 22:28 UTC (permalink / raw) To: David Howells Cc: keyrings, linux-usb, linux-block, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing On Wed, Sep 4, 2019 at 3:15 PM David Howells <dhowells@redhat.com> wrote: > > > Here's a set of patches to add a general notification queue concept and to > add event sources such as: Why? I'm just going to be very blunt about this, and say that there is no way I can merge any of this *ever*, unless other people stand up and say that (a) they'll use it and (b) they'll actively develop it and participate in testing and coding Because I'm simply not willing to have the same situation that happened with the keyring ACL stuff this merge window happen with some other random feature some day in the future. That change never had anybody else that showed any interest in it, it was never really clear why it was made, and it broke booting for me. That had better never happen again, and I'm tired of seeing unexplained random changes to key handling that have one single author and nobody else involved. And there is this whole long cover letter to explain what the code does, what you can do with it, and what the changes have been in revisions, but AT NO POINT does it explain what the point of the feature is at all. Why would we want this, and what is the advantage over udev etc that already has event handling for things like block events and USB events? What's the advantage of another random character device, and what's the use? Who is asking for this, and who would use it? Why are keys special, and why should you be able to track events on keys in the first place? Who is co-developing and testing this, and what's the point? Fundamentally, I'm not even interested in seeing "Reviewed-by". New features need actual users and explanations for what they are, over and beyond the developer itself. IOW, you need to have an outside person step in and say "yes, I need this". No more of these "David makes random changes without any external input" series. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Why add the general notification queue and its sources 2019-09-04 22:15 ` David Howells @ 2019-09-05 17:01 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-05 17:01 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Greg Kroah-Hartman, rstrode, swhiteho, nicolas.dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Here's a set of patches to add a general notification queue concept and to > > add event sources such as: > > Why? > > I'm just going to be very blunt about this, and say that there is no > way I can merge any of this *ever*, unless other people stand up and > say that > > (a) they'll use it > > and > > (b) they'll actively develop it and participate in testing and coding Besides the core notification buffer which ties this together, there are a number of sources that I've implemented, not all of which are in this patch series: (1) Key/keyring notifications. If you have your kerberos tickets in a file/directory, your gnome desktop will monitor that using something like fanotify and tell you if your credentials cache changes. We also have the ability to cache your kerberos tickets in the session, user or persistent keyring so that it isn't left around on disk across a reboot or logout. Keyrings, however, cannot currently be monitored asynchronously, so the desktop has to poll for it - not so good on a laptop. This source will allow the desktop to avoid the need to poll. (2) USB notifications. GregKH was looking for a way to do USB notifications as I was looking to find additional sources to implement. I'm not sure how he wants to use them, but I'll let him speak to that himself. (3) Block notifications. This one I was thinking that I could make something like ddrescue better by letting it get notifications this way. This was a target of convenience since I had a dodgy disk I was trying to rescue. It could also potentially be used help systemd, say, detect broken devices and avoid trying to unmount them when trying to reboot the machine. I can drop this for now if you prefer. (4) Mount notifications. This one is wanted to avoid repeated trawling of /proc/mounts or similar to work out changes to the mount object attributes and mount topology. I'm told that the proc file holding the namespace_sem is a point of contention, especially as the process of generating the text descriptions of the mounts/superblocks can be quite involved. The notifications directly indicate the mounts involved in any particular event and what the change was. You can poll /proc/mounts, but all you know is that something changed; you don't know what and you don't know how and reading that file may race with multiple changed being effected. I pair this with a new fsinfo() system call that allows, amongst other things, the ability to retrieve in one go an { id, change counter } tuple from all the children of a specified mount, allowing buffer overruns to be cleaned up quickly. It's not just Red Hat that's potentially interested in this: https://lore.kernel.org/linux-fsdevel/293c9bd3-f530-d75e-c353-ddeabac27cf6@6wind.com/ (5) Superblock notifications. This one is provided to allow systemd or the desktop to more easily detect events such as I/O errors and EDQUOT/ENOSPC. I've tried to make the core multipurpose so that the price of the code footprint is mitigated. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Why add the general notification queue and its sources @ 2019-09-05 17:01 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-05 17:01 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Greg Kroah-Hartman, rstrode, swhiteho, nicolas.dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Here's a set of patches to add a general notification queue concept and to > > add event sources such as: > > Why? > > I'm just going to be very blunt about this, and say that there is no > way I can merge any of this *ever*, unless other people stand up and > say that > > (a) they'll use it > > and > > (b) they'll actively develop it and participate in testing and coding Besides the core notification buffer which ties this together, there are a number of sources that I've implemented, not all of which are in this patch series: (1) Key/keyring notifications. If you have your kerberos tickets in a file/directory, your gnome desktop will monitor that using something like fanotify and tell you if your credentials cache changes. We also have the ability to cache your kerberos tickets in the session, user or persistent keyring so that it isn't left around on disk across a reboot or logout. Keyrings, however, cannot currently be monitored asynchronously, so the desktop has to poll for it - not so good on a laptop. This source will allow the desktop to avoid the need to poll. (2) USB notifications. GregKH was looking for a way to do USB notifications as I was looking to find additional sources to implement. I'm not sure how he wants to use them, but I'll let him speak to that himself. (3) Block notifications. This one I was thinking that I could make something like ddrescue better by letting it get notifications this way. This was a target of convenience since I had a dodgy disk I was trying to rescue. It could also potentially be used help systemd, say, detect broken devices and avoid trying to unmount them when trying to reboot the machine. I can drop this for now if you prefer. (4) Mount notifications. This one is wanted to avoid repeated trawling of /proc/mounts or similar to work out changes to the mount object attributes and mount topology. I'm told that the proc file holding the namespace_sem is a point of contention, especially as the process of generating the text descriptions of the mounts/superblocks can be quite involved. The notifications directly indicate the mounts involved in any particular event and what the change was. You can poll /proc/mounts, but all you know is that something changed; you don't know what and you don't know how and reading that file may race with multiple changed being effected. I pair this with a new fsinfo() system call that allows, amongst other things, the ability to retrieve in one go an { id, change counter } tuple from all the children of a specified mount, allowing buffer overruns to be cleaned up quickly. It's not just Red Hat that's potentially interested in this: https://lore.kernel.org/linux-fsdevel/293c9bd3-f530-d75e-c353-ddeabac27cf6@6wind.com/ (5) Superblock notifications. This one is provided to allow systemd or the desktop to more easily detect events such as I/O errors and EDQUOT/ENOSPC. I've tried to make the core multipurpose so that the price of the code footprint is mitigated. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 17:01 ` David Howells @ 2019-09-05 17:19 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-05 17:19 UTC (permalink / raw) To: David Howells Cc: Greg Kroah-Hartman, rstrode, swhiteho, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing On Thu, Sep 5, 2019 at 10:01 AM David Howells <dhowells@redhat.com> wrote: > > > > I'm just going to be very blunt about this, and say that there is no > > way I can merge any of this *ever*, unless other people stand up and > > say that > > > > (a) they'll use it > > > > and > > > > (b) they'll actively develop it and participate in testing and coding > > Besides the core notification buffer which ties this together, there are a > number of sources that I've implemented, not all of which are in this patch > series: You've at least now answered part of the "Why", but you didn't actually answer the whole "another developer" part. I really don't like how nobody else than you seems to even look at any of the key handling patches. Because nobody else seems to care. This seems to be another new subsystem / driver that has the same pattern. If it's all just you, I don't want to merge it, because I really want more than just other developers doing "Reviewed-by" after looking at somebody elses code that they don't actually use or really care about. See what I'm saying? New features that go into the kernel should have multiple users. Not a single developer who pushes both the kernel feature and the single use of that feature. This very much comes from me reverting the key ACL pull. Not only did I revert it, ABSOLUTELY NOBODY even reacted to the revert. Nobody stepped up and said they they want that new ACL code, and pushed for a fix. There was some very little murmuring about it when Mimi at least figured out _why_ it broke, but other than that all the noise I saw about the revert was Eric Biggers pointing out it broke other things too, and that it had actually broken some test suites. But since it hadn't even been in linux-next, that too had been noticed much too late. See what I'm saying? This whole "David Howells does his own features that nobody else uses" needs to stop. You need to have a champion. I just don't feel safe pulling these kinds of changes from you, because I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it or really cared. Most of the patches has nobody else even Cc'd, and even the patches that do have some "Reviewed-by" feel more like somebody else went "ok, the change looks fine to me", without any other real attachment to the code. New kernel features and interfaces really need to have a higher barrier of entry than one developer working on his or her own thing. Is that a change from 25 years ago? Or yes it is. We can point to lots of "single developer did a thing" from years past. But things have changed. And once bitten, twice shy: I really am a _lot_ more nervous about all these key changes now. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 17:19 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-05 17:19 UTC (permalink / raw) To: David Howells Cc: Greg Kroah-Hartman, rstrode, swhiteho, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing On Thu, Sep 5, 2019 at 10:01 AM David Howells <dhowells@redhat.com> wrote: > > > > I'm just going to be very blunt about this, and say that there is no > > way I can merge any of this *ever*, unless other people stand up and > > say that > > > > (a) they'll use it > > > > and > > > > (b) they'll actively develop it and participate in testing and coding > > Besides the core notification buffer which ties this together, there are a > number of sources that I've implemented, not all of which are in this patch > series: You've at least now answered part of the "Why", but you didn't actually answer the whole "another developer" part. I really don't like how nobody else than you seems to even look at any of the key handling patches. Because nobody else seems to care. This seems to be another new subsystem / driver that has the same pattern. If it's all just you, I don't want to merge it, because I really want more than just other developers doing "Reviewed-by" after looking at somebody elses code that they don't actually use or really care about. See what I'm saying? New features that go into the kernel should have multiple users. Not a single developer who pushes both the kernel feature and the single use of that feature. This very much comes from me reverting the key ACL pull. Not only did I revert it, ABSOLUTELY NOBODY even reacted to the revert. Nobody stepped up and said they they want that new ACL code, and pushed for a fix. There was some very little murmuring about it when Mimi at least figured out _why_ it broke, but other than that all the noise I saw about the revert was Eric Biggers pointing out it broke other things too, and that it had actually broken some test suites. But since it hadn't even been in linux-next, that too had been noticed much too late. See what I'm saying? This whole "David Howells does his own features that nobody else uses" needs to stop. You need to have a champion. I just don't feel safe pulling these kinds of changes from you, because I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it or really cared. Most of the patches has nobody else even Cc'd, and even the patches that do have some "Reviewed-by" feel more like somebody else went "ok, the change looks fine to me", without any other real attachment to the code. New kernel features and interfaces really need to have a higher barrier of entry than one developer working on his or her own thing. Is that a change from 25 years ago? Or yes it is. We can point to lots of "single developer did a thing" from years past. But things have changed. And once bitten, twice shy: I really am a _lot_ more nervous about all these key changes now. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 17:19 ` Linus Torvalds (?) @ 2019-09-05 18:32 ` Ray Strode -1 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-05 18:32 UTC (permalink / raw) To: Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Hi, On Thu, Sep 5, 2019 at 1:20 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > You've at least now answered part of the "Why", but you didn't > actually answer the whole "another developer" part. It's certainly something we've wanted in the GNOME world for a long time: See for instance https://bugzilla.redhat.com/show_bug.cgi?id=991110 and https://bugzilla.gnome.org/show_bug.cgi?id=707402 from all the way back 2013. These are the alternatives I can think of: - poll? status quo, but not great for obvious wakeup reasons - use a different credential cache collection type that does support change notification? some of the other types do support change notification, but have their own set of problems. But maybe we should just go back to DIR type credential cache collections and try to figure out the life cycle problems they pose, i don't know... or get more man power behind KCM... - manage change notification entirely from userspace. assume credentials will always be put in place from krb5-libs entry points, and just skip notification if it happens out from under the libraries. maybe upstream kerberos guys would be onboard with this, I don't know. This seems less robust than having the kernel in the loop, though. > I really don't like how nobody else than you seems to even look at any > of the key handling patches. Because nobody else seems to care. I've got no insight here, so i'll just throw a dart... viro, is this something you have any interest in watching closer? > See what I'm saying? This whole "David Howells does his own features > that nobody else uses" needs to stop. You need to have a champion. I > just don't feel safe pulling these kinds of changes from you, because > I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it > or really cared. I get the "one man is not enough for proper maintenance" argument, and maybe it's true. I don't know. But I just want to point out, I have been asking dhowells for this change notification API for years, so it's not something he did on his own and for no particularly good reason. It solves a real problem and has a real-world use case. He kindly did it because I (and Robbie Harwood and others) asked him about it, off and on, and he was able to fit it onto his priority list for us. From this thread, it sounds like he solved a problem for Greg too, killing a couple birds with one stone? --Ray ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 18:32 ` Ray Strode 0 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-05 18:32 UTC (permalink / raw) To: Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Hi, On Thu, Sep 5, 2019 at 1:20 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > You've at least now answered part of the "Why", but you didn't > actually answer the whole "another developer" part. It's certainly something we've wanted in the GNOME world for a long time: See for instance https://bugzilla.redhat.com/show_bug.cgi?id=991110 and https://bugzilla.gnome.org/show_bug.cgi?id=707402 from all the way back 2013. These are the alternatives I can think of: - poll? status quo, but not great for obvious wakeup reasons - use a different credential cache collection type that does support change notification? some of the other types do support change notification, but have their own set of problems. But maybe we should just go back to DIR type credential cache collections and try to figure out the life cycle problems they pose, i don't know... or get more man power behind KCM... - manage change notification entirely from userspace. assume credentials will always be put in place from krb5-libs entry points, and just skip notification if it happens out from under the libraries. maybe upstream kerberos guys would be onboard with this, I don't know. This seems less robust than having the kernel in the loop, though. > I really don't like how nobody else than you seems to even look at any > of the key handling patches. Because nobody else seems to care. I've got no insight here, so i'll just throw a dart... viro, is this something you have any interest in watching closer? > See what I'm saying? This whole "David Howells does his own features > that nobody else uses" needs to stop. You need to have a champion. I > just don't feel safe pulling these kinds of changes from you, because > I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it > or really cared. I get the "one man is not enough for proper maintenance" argument, and maybe it's true. I don't know. But I just want to point out, I have been asking dhowells for this change notification API for years, so it's not something he did on his own and for no particularly good reason. It solves a real problem and has a real-world use case. He kindly did it because I (and Robbie Harwood and others) asked him about it, off and on, and he was able to fit it onto his priority list for us. >From this thread, it sounds like he solved a problem for Greg too, killing a couple birds with one stone? --Ray ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 18:32 ` Ray Strode 0 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-05 18:32 UTC (permalink / raw) To: Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="maccentraleurope", Size: 2415 bytes --] Hi, On Thu, Sep 5, 2019 at 1:20 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > You've at least now answered part of the "Why", but you didn't > actually answer the whole "another developer" part. It's certainly something we've wanted in the GNOME world for a long time: See for instance https://bugzilla.redhat.com/show_bug.cgi?id1110 and https://bugzilla.gnome.org/show_bug.cgi?idp7402 from all the way back 2013. These are the alternatives I can think of: - poll? status quo, but not great for obvious wakeup reasons - use a different credential cache collection type that does support change notification? some of the other types do support change notification, but have their own set of problems. But maybe we should just go back to DIR type credential cache collections and try to figure out the life cycle problems they pose, i don't know... or get more man power behind KCM... - manage change notification entirely from userspace. assume credentials will always be put in place from krb5-libs entry points, and just skip notification if it happens out from under the libraries. maybe upstream kerberos guys would be onboard with this, I don't know. This seems less robust than having the kernel in the loop, though. > I really don't like how nobody else than you seems to even look at any > of the key handling patches. Because nobody else seems to care. I've got no insight here, so i'll just throw a dart... viro, is this something you have any interest in watching closer? > See what I'm saying? This whole "David Howells does his own features > that nobody else uses" needs to stop. You need to have a champion. I > just don't feel safe pulling these kinds of changes from you, because > I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it > or really cared. I get the "one man is not enough for proper maintenance" argument, and maybe it's true. I don't know. But I just want to point out, I have been asking dhowells for this change notification API for years, so it's not something he did on his own and for no particularly good reason. It solves a real problem and has a real-world use case. He kindly did it because I (and Robbie Harwood and others) asked him about it, off and on, and he was able to fit it onto his priority list for us. From this thread, it sounds like he solved a problem for Greg too, killing a couple birds with one stone? --Ray ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 18:32 ` Ray Strode @ 2019-09-05 20:39 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-05 20:39 UTC (permalink / raw) To: Ray Strode Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Thu, Sep 5, 2019 at 11:33 AM Ray Strode <rstrode@redhat.com> wrote: > > Hi, > > On Thu, Sep 5, 2019 at 1:20 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > You've at least now answered part of the "Why", but you didn't > > actually answer the whole "another developer" part. > It's certainly something we've wanted in the GNOME world for a long time: > > See for instance > > https://bugzilla.redhat.com/show_bug.cgi?id=991110 That is *way* too specific to make for any kind of generic notification mechanism. Also, what is the security model here? Open a special character device, and you get access to random notifications from random sources? That makes no sense. Do they have the same security permissions? USB error reporting is one thing - and has completely different security rules than some per-user key thing (or system? or namespace? Or what?) And why would you do a broken big-key thing in the kernel in the first place? Why don't you just have a kernel key to indirectly encrypt using a key and "additional user space data". The kernel should simply not take care of insane 1MB keys. Big keys just don't make sense for a kernel. Just use the backing store THAT YOU HAVE TO HAVE ANYWAY. Introduce some "indirect key" instead that is used to encrypt and authenticate the backing store. And mix in /proc/mounts tracking, which has a namespace component and completely different events and security model (likely "none" - since you can always read your own /proc/mounts). So honestly, this all just makes me go "user interfaces are hard, all the users seem to have *completely* different requirements, and nobody has apparently really tested this in practice". Maybe a generic notification mechanism is sensible. But I don't see how security issues could *possibly* be unified, and some of the examples given (particularly "track changes to /proc/mounts") seem to have obviously better alternatives (as in "just support poll() on it"). All this discussion has convinced me of is that this whole thing is half-baked and not ready even on a conceptual level. So as far as I'm concerned, I think I want things like actual "Tested-by:" lines from actual users, because it's not clear that this makes sense. Gnome certainly should work as a regular user, if you need a system daemon for it with root privileges you might as well just do any notification entirely inside that daemon in user space. Same goes for /proc/mounts - which as mentioned has a much more obvious interface for waiting anyway. User interfaces need a lot of thought and testing. They shouldn't be ad-hoc "maybe this could work for X, Y and Z" theories. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 20:39 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-05 20:39 UTC (permalink / raw) To: Ray Strode Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="maccentraleurope", Size: 2682 bytes --] On Thu, Sep 5, 2019 at 11:33 AM Ray Strode <rstrode@redhat.com> wrote: > > Hi, > > On Thu, Sep 5, 2019 at 1:20 PM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > You've at least now answered part of the "Why", but you didn't > > actually answer the whole "another developer" part. > It's certainly something we've wanted in the GNOME world for a long time: > > See for instance > > https://bugzilla.redhat.com/show_bug.cgi?id1110 That is *way* too specific to make for any kind of generic notification mechanism. Also, what is the security model here? Open a special character device, and you get access to random notifications from random sources? That makes no sense. Do they have the same security permissions? USB error reporting is one thing - and has completely different security rules than some per-user key thing (or system? or namespace? Or what?) And why would you do a broken big-key thing in the kernel in the first place? Why don't you just have a kernel key to indirectly encrypt using a key and "additional user space data". The kernel should simply not take care of insane 1MB keys. Big keys just don't make sense for a kernel. Just use the backing store THAT YOU HAVE TO HAVE ANYWAY. Introduce some "indirect key" instead that is used to encrypt and authenticate the backing store. And mix in /proc/mounts tracking, which has a namespace component and completely different events and security model (likely "none" - since you can always read your own /proc/mounts). So honestly, this all just makes me go "user interfaces are hard, all the users seem to have *completely* different requirements, and nobody has apparently really tested this in practice". Maybe a generic notification mechanism is sensible. But I don't see how security issues could *possibly* be unified, and some of the examples given (particularly "track changes to /proc/mounts") seem to have obviously better alternatives (as in "just support poll() on it"). All this discussion has convinced me of is that this whole thing is half-baked and not ready even on a conceptual level. So as far as I'm concerned, I think I want things like actual "Tested-by:" lines from actual users, because it's not clear that this makes sense. Gnome certainly should work as a regular user, if you need a system daemon for it with root privileges you might as well just do any notification entirely inside that daemon in user space. Same goes for /proc/mounts - which as mentioned has a much more obvious interface for waiting anyway. User interfaces need a lot of thought and testing. They shouldn't be ad-hoc "maybe this could work for X, Y and Z" theories. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 20:39 ` Linus Torvalds @ 2019-09-06 19:32 ` Ray Strode -1 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-06 19:32 UTC (permalink / raw) To: Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Hi, On Thu, Sep 5, 2019 at 4:39 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > That is *way* too specific to make for any kind of generic > notification mechanism. Well from my standpoint, I just don't want to have to poll... I don't have a strong opinion about how it looks architecturally to reach that goal. Ideally, at a higher level, I want the userspace api that gnome uses to be something like: err = krb5_cc_watch (ctx, ccache, (krb5_cc_change_fct) on_cc_change , &watch_fd); then a watch_fd would get handed back and caller could poll on it. if it woke up poll(), caller would do krb5_cc_watch_update (ctx, ccache, watch_fd) or so and it would trigger on_cc_change to get called (or something like that). If under the hood, fd comes from opening /dev/watch_queue, and krb5_cc_watch_update reads from some mmap'ed buffer to decide whether or not to call on_cc_change, that's fine with me. If under the hood, fd comes from a pipe fd returned from some ioctl or syscall, and krb5_cc_watch_update reads messages directly from that fd to decide whether or not to call on_cc_change, that's fine with me. too. it could be an eventfd too, or whatever, too, just as long as its something I can add to poll() and don't have to intermittently poll ... :-) > Also, what is the security model here? Open a special character > device, and you get access to random notifications from random > sources? I guess dhowells answered this... > And why would you do a broken big-key thing in the kernel in the first > place? Why don't you just have a kernel key to indirectly encrypt > using a key and "additional user space data". The kernel should simply > not take care of insane 1MB keys. 🤷 dunno. I assume you're referencing the discussions from comment 0 on that 2013 bug. I wasn't involved in those discussions, I just chimed in after they happened trying to avoid having to add polling :-) I have no idea why a ticket would get that large. I assume it only is in weird edge cases. Anyway, gnome treats the tickets as opaque blobs. it doesn't do anything with them other than tell the user when they need to get refreshed... all the actual key manipulation happens from krb5 libraries. of course, one advantage of having the tickets kernel side is nfs could in theory access them directly, rather than up calling back to userspace... --Ray ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 19:32 ` Ray Strode 0 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-06 19:32 UTC (permalink / raw) To: Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Hi, On Thu, Sep 5, 2019 at 4:39 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > That is *way* too specific to make for any kind of generic > notification mechanism. Well from my standpoint, I just don't want to have to poll... I don't have a strong opinion about how it looks architecturally to reach that goal. Ideally, at a higher level, I want the userspace api that gnome uses to be something like: err = krb5_cc_watch (ctx, ccache, (krb5_cc_change_fct) on_cc_change , &watch_fd); then a watch_fd would get handed back and caller could poll on it. if it woke up poll(), caller would do krb5_cc_watch_update (ctx, ccache, watch_fd) or so and it would trigger on_cc_change to get called (or something like that). If under the hood, fd comes from opening /dev/watch_queue, and krb5_cc_watch_update reads from some mmap'ed buffer to decide whether or not to call on_cc_change, that's fine with me. If under the hood, fd comes from a pipe fd returned from some ioctl or syscall, and krb5_cc_watch_update reads messages directly from that fd to decide whether or not to call on_cc_change, that's fine with me. too. it could be an eventfd too, or whatever, too, just as long as its something I can add to poll() and don't have to intermittently poll ... :-) > Also, what is the security model here? Open a special character > device, and you get access to random notifications from random > sources? I guess dhowells answered this... > And why would you do a broken big-key thing in the kernel in the first > place? Why don't you just have a kernel key to indirectly encrypt > using a key and "additional user space data". The kernel should simply > not take care of insane 1MB keys. 🤷 dunno. I assume you're referencing the discussions from comment 0 on that 2013 bug. I wasn't involved in those discussions, I just chimed in after they happened trying to avoid having to add polling :-) I have no idea why a ticket would get that large. I assume it only is in weird edge cases. Anyway, gnome treats the tickets as opaque blobs. it doesn't do anything with them other than tell the user when they need to get refreshed... all the actual key manipulation happens from krb5 libraries. of course, one advantage of having the tickets kernel side is nfs could in theory access them directly, rather than up calling back to userspace... --Ray ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 19:32 ` Ray Strode @ 2019-09-06 19:41 ` Ray Strode -1 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-06 19:41 UTC (permalink / raw) To: Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Hi, On Fri, Sep 6, 2019 at 3:32 PM Ray Strode <rstrode@redhat.com> wrote: > of course, one advantage of having the tickets kernel side is nfs could > in theory access them directly, rather than up calling back to userspace... No, that's not true actually, it's still going to need to go to userspace to do hairy context setup i guess... so 🤷 dunno. ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 19:41 ` Ray Strode 0 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-06 19:41 UTC (permalink / raw) To: Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Hi, On Fri, Sep 6, 2019 at 3:32 PM Ray Strode <rstrode@redhat.com> wrote: > of course, one advantage of having the tickets kernel side is nfs could > in theory access them directly, rather than up calling back to userspace... No, that's not true actually, it's still going to need to go to userspace to do hairy context setup i guess... so 🤷 dunno. ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 19:32 ` Ray Strode (?) @ 2019-09-06 19:53 ` Robbie Harwood -1 siblings, 0 replies; 234+ messages in thread From: Robbie Harwood @ 2019-09-06 19:53 UTC (permalink / raw) To: Ray Strode, Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi [-- Attachment #1: Type: text/plain, Size: 3154 bytes --] Ray Strode <rstrode@redhat.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> That is *way* too specific to make for any kind of generic >> notification mechanism. > > Well from my standpoint, I just don't want to have to poll... I don't > have a strong opinion about how it looks architecturally to reach that > goal. > > Ideally, at a higher level, I want the userspace api that gnome uses > to be something like: > > err = krb5_cc_watch (ctx, ccache, (krb5_cc_change_fct) on_cc_change , > &watch_fd); > > then a watch_fd would get handed back and caller could poll on it. if > it woke up poll(), caller would do > > krb5_cc_watch_update (ctx, ccache, watch_fd) > > or so and it would trigger on_cc_change to get called (or something like that). > > If under the hood, fd comes from opening /dev/watch_queue, and > krb5_cc_watch_update reads from some mmap'ed buffer to decide whether > or not to call on_cc_change, that's fine with me. > > If under the hood, fd comes from a pipe fd returned from some ioctl or > syscall, and krb5_cc_watch_update reads messages directly from that fd > to decide whether or not to call on_cc_change, that's fine with > me. too. > > it could be an eventfd too, or whatever, too, just as long as its > something I can add to poll() and don't have to intermittently poll > ... :-) > > >> And why would you do a broken big-key thing in the kernel in the >> first place? Why don't you just have a kernel key to indirectly >> encrypt using a key and "additional user space data". The kernel >> should simply not take care of insane 1MB keys. > > 🤷 dunno. I assume you're referencing the discussions from comment 0 > on that 2013 bug. I wasn't involved in those discussions, I just > chimed in after they happened trying to avoid having to add polling > :-) > > I have no idea why a ticket would get that large. I assume it only is > in weird edge cases. Sadly they're not weird, but yes. Kerberos tickets can get decently large due to Microsoft's PACs (see MS-PAC and MS-KILE), which we need to process, understand, and store for Active Directory interop. I'm not sure if I've personally made one over 1MB, but it could easily occur given enough claims (MS-ADTS). > Anyway, gnome treats the tickets as opaque blobs. it doesn't do anything > with them other than tell the user when they need to get refreshed... > > all the actual key manipulation happens from krb5 libraries. > > of course, one advantage of having the tickets kernel side is nfs > could in theory access them directly, rather than up calling back to > userspace... Easy availability to filesystems is in fact the main theoretical advantage of the keyring for us in krb5 (and, for whatever it's worth, is why we're interested in namespaced keyrings for containers). Our privilege separation mechanism (gssproxy) is cache-type agnostic, and we do have other credential cache types (KCM and DIR/FILE) in the implementation. Thanks, --Robbie -- Robbie Harwood Kerberos Development Lead Security Engineering Team Red Hat, Inc. Boston, MA, US [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 19:53 ` Robbie Harwood 0 siblings, 0 replies; 234+ messages in thread From: Robbie Harwood @ 2019-09-06 19:53 UTC (permalink / raw) To: Ray Strode, Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi [-- Attachment #1: Type: text/plain, Size: 3154 bytes --] Ray Strode <rstrode@redhat.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> That is *way* too specific to make for any kind of generic >> notification mechanism. > > Well from my standpoint, I just don't want to have to poll... I don't > have a strong opinion about how it looks architecturally to reach that > goal. > > Ideally, at a higher level, I want the userspace api that gnome uses > to be something like: > > err = krb5_cc_watch (ctx, ccache, (krb5_cc_change_fct) on_cc_change , > &watch_fd); > > then a watch_fd would get handed back and caller could poll on it. if > it woke up poll(), caller would do > > krb5_cc_watch_update (ctx, ccache, watch_fd) > > or so and it would trigger on_cc_change to get called (or something like that). > > If under the hood, fd comes from opening /dev/watch_queue, and > krb5_cc_watch_update reads from some mmap'ed buffer to decide whether > or not to call on_cc_change, that's fine with me. > > If under the hood, fd comes from a pipe fd returned from some ioctl or > syscall, and krb5_cc_watch_update reads messages directly from that fd > to decide whether or not to call on_cc_change, that's fine with > me. too. > > it could be an eventfd too, or whatever, too, just as long as its > something I can add to poll() and don't have to intermittently poll > ... :-) > > >> And why would you do a broken big-key thing in the kernel in the >> first place? Why don't you just have a kernel key to indirectly >> encrypt using a key and "additional user space data". The kernel >> should simply not take care of insane 1MB keys. > > 🤷 dunno. I assume you're referencing the discussions from comment 0 > on that 2013 bug. I wasn't involved in those discussions, I just > chimed in after they happened trying to avoid having to add polling > :-) > > I have no idea why a ticket would get that large. I assume it only is > in weird edge cases. Sadly they're not weird, but yes. Kerberos tickets can get decently large due to Microsoft's PACs (see MS-PAC and MS-KILE), which we need to process, understand, and store for Active Directory interop. I'm not sure if I've personally made one over 1MB, but it could easily occur given enough claims (MS-ADTS). > Anyway, gnome treats the tickets as opaque blobs. it doesn't do anything > with them other than tell the user when they need to get refreshed... > > all the actual key manipulation happens from krb5 libraries. > > of course, one advantage of having the tickets kernel side is nfs > could in theory access them directly, rather than up calling back to > userspace... Easy availability to filesystems is in fact the main theoretical advantage of the keyring for us in krb5 (and, for whatever it's worth, is why we're interested in namespaced keyrings for containers). Our privilege separation mechanism (gssproxy) is cache-type agnostic, and we do have other credential cache types (KCM and DIR/FILE) in the implementation. Thanks, --Robbie -- Robbie Harwood Kerberos Development Lead Security Engineering Team Red Hat, Inc. Boston, MA, US [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 19:53 ` Robbie Harwood 0 siblings, 0 replies; 234+ messages in thread From: Robbie Harwood @ 2019-09-06 19:53 UTC (permalink / raw) To: Ray Strode, Linus Torvalds Cc: David Howells, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi [-- Attachment #1: Type: text/plain, Size: 3154 bytes --] Ray Strode <rstrode@redhat.com> writes: > Linus Torvalds <torvalds@linux-foundation.org> wrote: > >> That is *way* too specific to make for any kind of generic >> notification mechanism. > > Well from my standpoint, I just don't want to have to poll... I don't > have a strong opinion about how it looks architecturally to reach that > goal. > > Ideally, at a higher level, I want the userspace api that gnome uses > to be something like: > > err = krb5_cc_watch (ctx, ccache, (krb5_cc_change_fct) on_cc_change , > &watch_fd); > > then a watch_fd would get handed back and caller could poll on it. if > it woke up poll(), caller would do > > krb5_cc_watch_update (ctx, ccache, watch_fd) > > or so and it would trigger on_cc_change to get called (or something like that). > > If under the hood, fd comes from opening /dev/watch_queue, and > krb5_cc_watch_update reads from some mmap'ed buffer to decide whether > or not to call on_cc_change, that's fine with me. > > If under the hood, fd comes from a pipe fd returned from some ioctl or > syscall, and krb5_cc_watch_update reads messages directly from that fd > to decide whether or not to call on_cc_change, that's fine with > me. too. > > it could be an eventfd too, or whatever, too, just as long as its > something I can add to poll() and don't have to intermittently poll > ... :-) > > >> And why would you do a broken big-key thing in the kernel in the >> first place? Why don't you just have a kernel key to indirectly >> encrypt using a key and "additional user space data". The kernel >> should simply not take care of insane 1MB keys. > > 🤷 dunno. I assume you're referencing the discussions from comment 0 > on that 2013 bug. I wasn't involved in those discussions, I just > chimed in after they happened trying to avoid having to add polling > :-) > > I have no idea why a ticket would get that large. I assume it only is > in weird edge cases. Sadly they're not weird, but yes. Kerberos tickets can get decently large due to Microsoft's PACs (see MS-PAC and MS-KILE), which we need to process, understand, and store for Active Directory interop. I'm not sure if I've personally made one over 1MB, but it could easily occur given enough claims (MS-ADTS). > Anyway, gnome treats the tickets as opaque blobs. it doesn't do anything > with them other than tell the user when they need to get refreshed... > > all the actual key manipulation happens from krb5 libraries. > > of course, one advantage of having the tickets kernel side is nfs > could in theory access them directly, rather than up calling back to > userspace... Easy availability to filesystems is in fact the main theoretical advantage of the keyring for us in krb5 (and, for whatever it's worth, is why we're interested in namespaced keyrings for containers). Our privilege separation mechanism (gssproxy) is cache-type agnostic, and we do have other credential cache types (KCM and DIR/FILE) in the implementation. Thanks, --Robbie -- Robbie Harwood Kerberos Development Lead Security Engineering Team Red Hat, Inc. Boston, MA, US [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 18:32 ` Ray Strode @ 2019-09-05 21:32 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-05 21:32 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Linus Torvalds <torvalds@linux-foundation.org> wrote: > Also, what is the security model here? Open a special character > device, and you get access to random notifications from random > sources? > > That makes no sense. Do they have the same security permissions? Sigh. It doesn't work like that. I tried to describe this in the manpages I referred to in the cover note. Obviously I didn't do a good enough job. Let me try and explain the general workings and the security model here. (1) /dev/watch_queue just implements locked-in-memory buffers. It gets you no events by simply opening it. Each time you open it you get your own private buffer. Buffers are not shares. Creation of buffers is limited by ENFILE, EMFILE and RLIMIT_MEMLOCK. (2) A buffer is implemented as a pollable ring buffer, with the head pointer belonging to the kernel and the tail pointer belonging to userspace. Userspace mmaps the buffer. The kernel *only ever* reads the head and tail pointer from a buffer; it never reads anything else. When it wants to post a message to a buffer, the kernel reads the pointers and then does one of three things: (a) If the pointers were incoherent it drops the message. (b) If the buffer was full the kernel writes a flag to indicate this and drops the message. (c) Otherwise, the kernel writes a message and maybe padding at the place(s) it expects and writes the head pointer. If userspace was busy trashing the place, that should not cause a problem for the kernel. The buffer pointers are expected to run to the end and wrap naturally; they're only masked off at the point of actually accessing the buffer. (3) You connect event sources to your buffer, e.g.: fd = open("/dev/watch_queue", ...); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, ...); or: watch_mount(AT_FDCWD, "/net", 0, fd, ...); Security is checked at the point of connection to make sure you have permission to access that source. You have to have View permission on a key/keyring for key events, for example, and you have to have execute permission on a directory for mount events. The LSM gets a look-in too: Smack checks you have read permission on a key for example. (4) You can connect multiple sources of different types to your buffer and a source can be connected to multiple buffers at a time. (5) Security is checked when an event is delivered to make sure the triggerer of the event has permission to give you that event. Smack requires that the triggerer has write permission on the opener of the buffer for example. (6) poll() signals POLLIN|POLLRDNORM if there is stuff in the buffer and POLLERR if the pointers are incoherent. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 21:32 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-05 21:32 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Linus Torvalds <torvalds@linux-foundation.org> wrote: > Also, what is the security model here? Open a special character > device, and you get access to random notifications from random > sources? > > That makes no sense. Do they have the same security permissions? Sigh. It doesn't work like that. I tried to describe this in the manpages I referred to in the cover note. Obviously I didn't do a good enough job. Let me try and explain the general workings and the security model here. (1) /dev/watch_queue just implements locked-in-memory buffers. It gets you no events by simply opening it. Each time you open it you get your own private buffer. Buffers are not shares. Creation of buffers is limited by ENFILE, EMFILE and RLIMIT_MEMLOCK. (2) A buffer is implemented as a pollable ring buffer, with the head pointer belonging to the kernel and the tail pointer belonging to userspace. Userspace mmaps the buffer. The kernel *only ever* reads the head and tail pointer from a buffer; it never reads anything else. When it wants to post a message to a buffer, the kernel reads the pointers and then does one of three things: (a) If the pointers were incoherent it drops the message. (b) If the buffer was full the kernel writes a flag to indicate this and drops the message. (c) Otherwise, the kernel writes a message and maybe padding at the place(s) it expects and writes the head pointer. If userspace was busy trashing the place, that should not cause a problem for the kernel. The buffer pointers are expected to run to the end and wrap naturally; they're only masked off at the point of actually accessing the buffer. (3) You connect event sources to your buffer, e.g.: fd = open("/dev/watch_queue", ...); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, ...); or: watch_mount(AT_FDCWD, "/net", 0, fd, ...); Security is checked at the point of connection to make sure you have permission to access that source. You have to have View permission on a key/keyring for key events, for example, and you have to have execute permission on a directory for mount events. The LSM gets a look-in too: Smack checks you have read permission on a key for example. (4) You can connect multiple sources of different types to your buffer and a source can be connected to multiple buffers at a time. (5) Security is checked when an event is delivered to make sure the triggerer of the event has permission to give you that event. Smack requires that the triggerer has write permission on the opener of the buffer for example. (6) poll() signals POLLIN|POLLRDNORM if there is stuff in the buffer and POLLERR if the pointers are incoherent. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 21:32 ` David Howells @ 2019-09-05 22:08 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-05 22:08 UTC (permalink / raw) To: David Howells Cc: Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Thu, Sep 5, 2019 at 2:32 PM David Howells <dhowells@redhat.com> wrote: > > (1) /dev/watch_queue just implements locked-in-memory buffers. It gets you > no events by simply opening it. Cool. In-memory buffers. But I know - we *have* one of those. There's already a system call for it, and has been forever. One that we then extended to allow people to change the buffer size, and do a lot of other things with. It's called "pipe()". And you can give the writing side to other user space processes too, in case you are running an older kernel that didn't have some "event pipe support". It comes with resource management, because people already use those things. If you want to make a message protocol on top of it, it has cool atomicity guarantees for any message size less than PIPE_BUF, but to make things simple, maybe just say "fixed record sizes of 64 bytes" or something like that for events. Then you can use them from things like perl scripts, not just magical C programs. Why do we need a new kind of super-fancy high-speed thing for event reporting? If you have *so* many events that pipe handling is a performance issue, you have something seriously wrong going on. So no. I'm not interested in a magical memory-mapped pipe that is actually more limited than the real thing. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 22:08 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-05 22:08 UTC (permalink / raw) To: David Howells Cc: Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Thu, Sep 5, 2019 at 2:32 PM David Howells <dhowells@redhat.com> wrote: > > (1) /dev/watch_queue just implements locked-in-memory buffers. It gets you > no events by simply opening it. Cool. In-memory buffers. But I know - we *have* one of those. There's already a system call for it, and has been forever. One that we then extended to allow people to change the buffer size, and do a lot of other things with. It's called "pipe()". And you can give the writing side to other user space processes too, in case you are running an older kernel that didn't have some "event pipe support". It comes with resource management, because people already use those things. If you want to make a message protocol on top of it, it has cool atomicity guarantees for any message size less than PIPE_BUF, but to make things simple, maybe just say "fixed record sizes of 64 bytes" or something like that for events. Then you can use them from things like perl scripts, not just magical C programs. Why do we need a new kind of super-fancy high-speed thing for event reporting? If you have *so* many events that pipe handling is a performance issue, you have something seriously wrong going on. So no. I'm not interested in a magical memory-mapped pipe that is actually more limited than the real thing. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 21:32 ` David Howells @ 2019-09-05 23:18 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-05 23:18 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Linus Torvalds <torvalds@linux-foundation.org> wrote: > But I know - we *have* one of those. There's already a system call for > it, and has been forever. One that we then extended to allow people to > change the buffer size, and do a lot of other things with. > > It's called "pipe()". And you can give the writing side to other user > space processes too, in case you are running an older kernel that > didn't have some "event pipe support". It comes with resource > management, because people already use those things. Can you write into a pipe from softirq context and/or with spinlocks held and/or with the RCU read lock held? That is a requirement. Another is that messages get inserted whole or not at all (or if they are truncated, the size field gets updated). Since one end would certainly be attached to an fd, it looks on the face of it that writing into the pipe would require taking pipe->mutex. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 23:18 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-05 23:18 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Linus Torvalds <torvalds@linux-foundation.org> wrote: > But I know - we *have* one of those. There's already a system call for > it, and has been forever. One that we then extended to allow people to > change the buffer size, and do a lot of other things with. > > It's called "pipe()". And you can give the writing side to other user > space processes too, in case you are running an older kernel that > didn't have some "event pipe support". It comes with resource > management, because people already use those things. Can you write into a pipe from softirq context and/or with spinlocks held and/or with the RCU read lock held? That is a requirement. Another is that messages get inserted whole or not at all (or if they are truncated, the size field gets updated). Since one end would certainly be attached to an fd, it looks on the face of it that writing into the pipe would require taking pipe->mutex. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 23:18 ` David Howells @ 2019-09-06 0:07 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 0:07 UTC (permalink / raw) To: David Howells Cc: Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Thu, Sep 5, 2019 at 4:18 PM David Howells <dhowells@redhat.com> wrote: > > Can you write into a pipe from softirq context and/or with spinlocks held > and/or with the RCU read lock held? That is a requirement. Another is that > messages get inserted whole or not at all (or if they are truncated, the size > field gets updated). Right now we use a mutex for the buffer locking, so no, pipe buffers are not irq-safe or atomic. That's due to the whole "we may block on data from user space" when doing a write. HOWEVER. Pipes actually have buffers on two different levels: there's the actual data buffers themselves (each described by a "struct pipe_buffer"), and there's the circular queue of them (the "pipe->buf[]" array, with pipe->curbuf/nrbufs) that points to individual data buffers. And we could easily separate out that data buffer management. Right now it's not really all that separated: people just do things like int newbuf = (pipe->curbuf + bufs) & (pipe->buffers-1); struct pipe_buffer *buf = pipe->bufs + newbuf; ... pipe->nrbufs++; to add a buffer into that circular array of buffers, but _that_ part could be made separate. It's just all protected by the pipe mutex right now, so it has never been an issue. And yes, atomicity of writes has actually been an integral part of pipes since forever. It's actually the only unambiguous atomicity that POSIX guarantees. It only holds for writes to pipes() of less than PIPE_BUF blocks, but that's 4096 on Linux. > Since one end would certainly be attached to an fd, it looks on the face of it > that writing into the pipe would require taking pipe->mutex. That's how the normal synchronization is done, yes. And changing that in general would be pretty painful. For example, two concurrent user-space writers might take page faults and just generally be painful, and the pipe locking needs to serialize that. So the mutex couldn't go away from pipes in general - it would remain for read/write/splice mutual exclusion (and it's not just the data it protects, it's the reader/writer logic for EPIPE etc). But the low-level pipe->bufs[] handling is another issue entirely. Even when a user space writer copies things from user space, it does so into a pre-allocated buffer that is then attached to the list of buffers somewhat separately (there's a magical special case where you can re-use a buffer that is marked as "I can be reused" and append into an already allocated buffer). And adding new buffers *could* be done with it's own separate locking. If you have a blocking writer (ie a user space data source), that would still take the pipe mutex, and it would delay the user space readers (because the readers also need the mutex), but it should not be all that hard to just make the whole "curbuf/nrbufs" handling use its own locking (maybe even some lockless atomics and cmpxchg). So a kernel writer could "insert" a "struct pipe_buffer" atomically, and wake up the reader atomically. No need for the other complexity that is protected by the mutex. The buggest problem is perhaps that the number of pipe buffers per pipe is fairly limited by default. PIPE_DEF_BUFFERS is 16, and if we'd insert using the ->bufs[] array, that would be the limit of "number of messages". But each message could be any size (we've historically limited pipe buffers to one page each, but that limit isn't all that hard. You could put more data in there). The number of pipe buffers _is_ dynamic, so the above PIPE_DEF_BUFFERS isn't a hard limit, but it would be the default. Would it be entirely trivial to do all the above? No. But it's *literally* just finding the places that work with pipe->curbuf/nrbufs and making them use atomic updates. You'd find all the places by just renaming them (and making them atomic or whatever) and the compiler will tell you "this area needs fixing". We've actually used pipes for messages before: autofs uses a magic packetized pipe buffer thing. It didn't need any extra atomicity, though, so it stil all worked with the regular pipe->mutex thing. And there is a big advantage from using pipes. They really would work with almost anything. You could even mix-and-match "data generated by kernel" and "data done by 'write()' or 'splice()' by a user process". NOTE! I'm not at all saying that pipes are perfect. You'll find people who swear by sockets instead. They have their own advantages (and disadvantages). Most people who do packet-based stuff tend to prefer sockets, because those have standard packet-based models (Linux pipes have that packet mode too, but it's certainly not standard, and I'm not even sure we ever exposed it to user space - it could be that it's only used by the autofs daemon). I have a soft spot for pipes, just because I think they are simpler than sockets. But that soft spot might be misplaced. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 0:07 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 0:07 UTC (permalink / raw) To: David Howells Cc: Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Thu, Sep 5, 2019 at 4:18 PM David Howells <dhowells@redhat.com> wrote: > > Can you write into a pipe from softirq context and/or with spinlocks held > and/or with the RCU read lock held? That is a requirement. Another is that > messages get inserted whole or not at all (or if they are truncated, the size > field gets updated). Right now we use a mutex for the buffer locking, so no, pipe buffers are not irq-safe or atomic. That's due to the whole "we may block on data from user space" when doing a write. HOWEVER. Pipes actually have buffers on two different levels: there's the actual data buffers themselves (each described by a "struct pipe_buffer"), and there's the circular queue of them (the "pipe->buf[]" array, with pipe->curbuf/nrbufs) that points to individual data buffers. And we could easily separate out that data buffer management. Right now it's not really all that separated: people just do things like int newbuf = (pipe->curbuf + bufs) & (pipe->buffers-1); struct pipe_buffer *buf = pipe->bufs + newbuf; ... pipe->nrbufs++; to add a buffer into that circular array of buffers, but _that_ part could be made separate. It's just all protected by the pipe mutex right now, so it has never been an issue. And yes, atomicity of writes has actually been an integral part of pipes since forever. It's actually the only unambiguous atomicity that POSIX guarantees. It only holds for writes to pipes() of less than PIPE_BUF blocks, but that's 4096 on Linux. > Since one end would certainly be attached to an fd, it looks on the face of it > that writing into the pipe would require taking pipe->mutex. That's how the normal synchronization is done, yes. And changing that in general would be pretty painful. For example, two concurrent user-space writers might take page faults and just generally be painful, and the pipe locking needs to serialize that. So the mutex couldn't go away from pipes in general - it would remain for read/write/splice mutual exclusion (and it's not just the data it protects, it's the reader/writer logic for EPIPE etc). But the low-level pipe->bufs[] handling is another issue entirely. Even when a user space writer copies things from user space, it does so into a pre-allocated buffer that is then attached to the list of buffers somewhat separately (there's a magical special case where you can re-use a buffer that is marked as "I can be reused" and append into an already allocated buffer). And adding new buffers *could* be done with it's own separate locking. If you have a blocking writer (ie a user space data source), that would still take the pipe mutex, and it would delay the user space readers (because the readers also need the mutex), but it should not be all that hard to just make the whole "curbuf/nrbufs" handling use its own locking (maybe even some lockless atomics and cmpxchg). So a kernel writer could "insert" a "struct pipe_buffer" atomically, and wake up the reader atomically. No need for the other complexity that is protected by the mutex. The buggest problem is perhaps that the number of pipe buffers per pipe is fairly limited by default. PIPE_DEF_BUFFERS is 16, and if we'd insert using the ->bufs[] array, that would be the limit of "number of messages". But each message could be any size (we've historically limited pipe buffers to one page each, but that limit isn't all that hard. You could put more data in there). The number of pipe buffers _is_ dynamic, so the above PIPE_DEF_BUFFERS isn't a hard limit, but it would be the default. Would it be entirely trivial to do all the above? No. But it's *literally* just finding the places that work with pipe->curbuf/nrbufs and making them use atomic updates. You'd find all the places by just renaming them (and making them atomic or whatever) and the compiler will tell you "this area needs fixing". We've actually used pipes for messages before: autofs uses a magic packetized pipe buffer thing. It didn't need any extra atomicity, though, so it stil all worked with the regular pipe->mutex thing. And there is a big advantage from using pipes. They really would work with almost anything. You could even mix-and-match "data generated by kernel" and "data done by 'write()' or 'splice()' by a user process". NOTE! I'm not at all saying that pipes are perfect. You'll find people who swear by sockets instead. They have their own advantages (and disadvantages). Most people who do packet-based stuff tend to prefer sockets, because those have standard packet-based models (Linux pipes have that packet mode too, but it's certainly not standard, and I'm not even sure we ever exposed it to user space - it could be that it's only used by the autofs daemon). I have a soft spot for pipes, just because I think they are simpler than sockets. But that soft spot might be misplaced. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 23:18 ` David Howells @ 2019-09-06 10:09 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-06 10:09 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Linus Torvalds <torvalds@linux-foundation.org> wrote: > But it's *literally* just finding the places that work with > pipe->curbuf/nrbufs and making them use atomic updates. No. It really isn't. That's two variables that describe the occupied section of the buffer. Unless you have something like a 68020 with CAS2, or put them next to each other so you can use CMPXCHG8, you can't do that. They need converting to head/tail pointers first. > They really would work with almost anything. You could even mix-and-match > "data generated by kernel" and "data done by 'write()' or 'splice()' by a > user process". Imagine that userspace writes a large message and takes the mutex. At the same time something in softirq context decides *it* wants to write a message - it can't take the mutex and it can't wait, so the userspace write would have to cause the kernel message to be dropped. What I would have to do is make a write to a notification pipe go through post_notification() and limit the size to the maximum for a single message. Much easier to simply suppress writes and splices on pipes that have been set up to be notification queues - at least for now. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 10:09 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-06 10:09 UTC (permalink / raw) To: Linus Torvalds Cc: dhowells, Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Linus Torvalds <torvalds@linux-foundation.org> wrote: > But it's *literally* just finding the places that work with > pipe->curbuf/nrbufs and making them use atomic updates. No. It really isn't. That's two variables that describe the occupied section of the buffer. Unless you have something like a 68020 with CAS2, or put them next to each other so you can use CMPXCHG8, you can't do that. They need converting to head/tail pointers first. > They really would work with almost anything. You could even mix-and-match > "data generated by kernel" and "data done by 'write()' or 'splice()' by a > user process". Imagine that userspace writes a large message and takes the mutex. At the same time something in softirq context decides *it* wants to write a message - it can't take the mutex and it can't wait, so the userspace write would have to cause the kernel message to be dropped. What I would have to do is make a write to a notification pipe go through post_notification() and limit the size to the maximum for a single message. Much easier to simply suppress writes and splices on pipes that have been set up to be notification queues - at least for now. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 10:09 ` David Howells @ 2019-09-06 15:35 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 15:35 UTC (permalink / raw) To: David Howells Cc: Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Fri, Sep 6, 2019 at 3:09 AM David Howells <dhowells@redhat.com> wrote: > > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > But it's *literally* just finding the places that work with > > pipe->curbuf/nrbufs and making them use atomic updates. > > No. It really isn't. That's two variables that describe the occupied section > of the buffer. Unless you have something like a 68020 with CAS2, or put them > next to each other so you can use CMPXCHG8, you can't do that. > > They need converting to head/tail pointers first. You misunderstand - because I phrased it badly. I meant "atomic" in the traditional kernel sense, as in "usable in not thread context" (eg GFP_ATOMIC etc). I'd start out just using a spinlock. I do agree that we could try to be fancy and do it entirely locklessly too, and I mentioned that in another part: "[..] it should not be all that hard to just make the whole "curbuf/nrbufs" handling use its own locking (maybe even some lockless atomics and cmpxchg)" but I also very much agree that it's much more complex. The main complexity of a lockless thing is actually almost certainly not in curbuf/nrbufs, because those could easily be packed as two 16-bit values in a 32-bit entity and then regular cmpxchg works fine. No, the complexity in the lockless model is that then you have to be very careful with the "buf[]" array update too. Maybe that's trivial (just make sure that they are NULL when not used), but it just looks less than wonderfully easy. So a lockless update I'm sure is _doable_ with some cleverness, but is probably not really worth it. That's particularly true since we already *have* a spinlock that we would take anyway: the we could strive to use the waitqueue spinlock in pipe->wait, and not even really add any new locking. That would require a bit of cleverness too and re-ordering things more, but we do that in other places (eg completions, but the fs_pin code does it too, and a few other cases. Look for "wake_up_locked()" and friends, which is a sure-fire sign that somebody is playing games and taking the wait-queue lock manually for their own nefarious reasons. > > They really would work with almost anything. You could even mix-and-match > > "data generated by kernel" and "data done by 'write()' or 'splice()' by a > > user process". > > Imagine that userspace writes a large message and takes the mutex. At the > same time something in softirq context decides *it* wants to write a message - > it can't take the mutex and it can't wait, so the userspace write would have > to cause the kernel message to be dropped. No. You're missing the point entirely. The mutex is entirely immaterial for the "insert a message". It is only used for user-space synchronization. The "add message to the pipe buffers" would only do the low-level buffer updates (whether using a new spinlock, re-using the pipe waitqueue lock, or entirely locklessly, ends up being then just an implementation detail). Note that user-space writes are defined to be atomic, but they are (a) not ordered and (b) only atomic up to a single buffer entry (which is that PIPE_BUF limit). So you can always put in a new buffer entry at any time. Obviously if a user space write just fills up the whole queue (or _other_ messages fill up the whole queue) you'd have to drop the notification. But that's always true. That's true even in your thing. The only difference is that we _allow_ other user spaces to write to the notification queue too. But if you don't want to allow that, then don't give out the write side of the pipe to any untrusted user space. But in *general*, allowing user space to write to the pipe is a great feature: it means that your notification source *can* be a user space daemon that you gave the write side of the pipe to (possibly using fd passing, possibly by just forking your own user-space child or cloning a thread). So for example, from a consumer standpoint, you can start off doing these things in user space with a helper thread that feeds the pipe (for example, polling /proc/mounts every second), and then when you've prototyped it and are happy with it, you can add the system call (or ioctl or whatever) to make the kernel generate the messages so that you don't have to poll. But now, once you have the kernel patch, you already have a proven user, and you can show numbers ("My user-space thing works, but it uses up 0.1% CPU time and has that nasty up-to-one-second latency because of polling"). Ta-daa! End result: it's backwards compatible, it's prototypable, and it's fairly easily extensible. Want to add a new source of events? Just pass the pipe to any random piece of code you want. It needs kernel support only when you've proven the concept _and_ you can show that "yeah, this user space polling model is a real performance or complexity problem" or whatever. This is why I like pipes. You can use them today. They are simple, and extensible, and you don't need to come up with a new subsystem and some untested ad-hoc thing that nobody has actually used. And they work automatically with all the existing infrastructure. They work with whatever perl or shell scripts, they work with poll/select loops, they work with user-space sources of events, they are just very flexible. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 15:35 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 15:35 UTC (permalink / raw) To: David Howells Cc: Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Fri, Sep 6, 2019 at 3:09 AM David Howells <dhowells@redhat.com> wrote: > > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > But it's *literally* just finding the places that work with > > pipe->curbuf/nrbufs and making them use atomic updates. > > No. It really isn't. That's two variables that describe the occupied section > of the buffer. Unless you have something like a 68020 with CAS2, or put them > next to each other so you can use CMPXCHG8, you can't do that. > > They need converting to head/tail pointers first. You misunderstand - because I phrased it badly. I meant "atomic" in the traditional kernel sense, as in "usable in not thread context" (eg GFP_ATOMIC etc). I'd start out just using a spinlock. I do agree that we could try to be fancy and do it entirely locklessly too, and I mentioned that in another part: "[..] it should not be all that hard to just make the whole "curbuf/nrbufs" handling use its own locking (maybe even some lockless atomics and cmpxchg)" but I also very much agree that it's much more complex. The main complexity of a lockless thing is actually almost certainly not in curbuf/nrbufs, because those could easily be packed as two 16-bit values in a 32-bit entity and then regular cmpxchg works fine. No, the complexity in the lockless model is that then you have to be very careful with the "buf[]" array update too. Maybe that's trivial (just make sure that they are NULL when not used), but it just looks less than wonderfully easy. So a lockless update I'm sure is _doable_ with some cleverness, but is probably not really worth it. That's particularly true since we already *have* a spinlock that we would take anyway: the we could strive to use the waitqueue spinlock in pipe->wait, and not even really add any new locking. That would require a bit of cleverness too and re-ordering things more, but we do that in other places (eg completions, but the fs_pin code does it too, and a few other cases. Look for "wake_up_locked()" and friends, which is a sure-fire sign that somebody is playing games and taking the wait-queue lock manually for their own nefarious reasons. > > They really would work with almost anything. You could even mix-and-match > > "data generated by kernel" and "data done by 'write()' or 'splice()' by a > > user process". > > Imagine that userspace writes a large message and takes the mutex. At the > same time something in softirq context decides *it* wants to write a message - > it can't take the mutex and it can't wait, so the userspace write would have > to cause the kernel message to be dropped. No. You're missing the point entirely. The mutex is entirely immaterial for the "insert a message". It is only used for user-space synchronization. The "add message to the pipe buffers" would only do the low-level buffer updates (whether using a new spinlock, re-using the pipe waitqueue lock, or entirely locklessly, ends up being then just an implementation detail). Note that user-space writes are defined to be atomic, but they are (a) not ordered and (b) only atomic up to a single buffer entry (which is that PIPE_BUF limit). So you can always put in a new buffer entry at any time. Obviously if a user space write just fills up the whole queue (or _other_ messages fill up the whole queue) you'd have to drop the notification. But that's always true. That's true even in your thing. The only difference is that we _allow_ other user spaces to write to the notification queue too. But if you don't want to allow that, then don't give out the write side of the pipe to any untrusted user space. But in *general*, allowing user space to write to the pipe is a great feature: it means that your notification source *can* be a user space daemon that you gave the write side of the pipe to (possibly using fd passing, possibly by just forking your own user-space child or cloning a thread). So for example, from a consumer standpoint, you can start off doing these things in user space with a helper thread that feeds the pipe (for example, polling /proc/mounts every second), and then when you've prototyped it and are happy with it, you can add the system call (or ioctl or whatever) to make the kernel generate the messages so that you don't have to poll. But now, once you have the kernel patch, you already have a proven user, and you can show numbers ("My user-space thing works, but it uses up 0.1% CPU time and has that nasty up-to-one-second latency because of polling"). Ta-daa! End result: it's backwards compatible, it's prototypable, and it's fairly easily extensible. Want to add a new source of events? Just pass the pipe to any random piece of code you want. It needs kernel support only when you've proven the concept _and_ you can show that "yeah, this user space polling model is a real performance or complexity problem" or whatever. This is why I like pipes. You can use them today. They are simple, and extensible, and you don't need to come up with a new subsystem and some untested ad-hoc thing that nobody has actually used. And they work automatically with all the existing infrastructure. They work with whatever perl or shell scripts, they work with poll/select loops, they work with user-space sources of events, they are just very flexible. Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 15:35 ` Linus Torvalds @ 2019-09-06 15:53 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 15:53 UTC (permalink / raw) To: David Howells Cc: Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > This is why I like pipes. You can use them today. They are simple, and > extensible, and you don't need to come up with a new subsystem and > some untested ad-hoc thing that nobody has actually used. The only _real_ complexity is to make sure that events are reliably parseable. That's where you really want to use the Linux-only "packet pipe" thing, becasue otherwise you have to have size markers or other things to delineate events. But if you do that, then it really becomes trivial. And I checked, we made it available to user space, even if the original reason for that code was kernel-only autofs use: you just need to make the pipe be O_DIRECT. This overly stupid program shows off the feature: #define _GNU_SOURCE #include <fcntl.h> #include <unistd.h> int main(int argc, char **argv) { int fd[2]; char buf[10]; pipe2(fd, O_DIRECT | O_NONBLOCK); write(fd[1], "hello", 5); write(fd[1], "hi", 2); read(fd[0], buf, sizeof(buf)); read(fd[0], buf, sizeof(buf)); return 0; } and it you strace it (because I was too lazy to add error handling or printing of results), you'll see write(4, "hello", 5) = 5 write(4, "hi", 2) = 2 read(3, "hello", 10) = 5 read(3, "hi", 10) = 2 note how you got packets of data on the reader side, instead of getting the traditional "just buffer it as a stream". So now you can even have multiple readers of the same event pipe, and packetization is obvious and trivial. Of course, I'm not sure why you'd want to have multiple readers, and you'd lose _ordering_, but if all events are independent, this _might_ be a useful thing in a threaded environment. Maybe. (Side note: a zero-sized write will not cause a zero-sized packet. It will just be dropped). Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 15:53 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 15:53 UTC (permalink / raw) To: David Howells Cc: Ray Strode, Greg Kroah-Hartman, Steven Whitehouse, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > This is why I like pipes. You can use them today. They are simple, and > extensible, and you don't need to come up with a new subsystem and > some untested ad-hoc thing that nobody has actually used. The only _real_ complexity is to make sure that events are reliably parseable. That's where you really want to use the Linux-only "packet pipe" thing, becasue otherwise you have to have size markers or other things to delineate events. But if you do that, then it really becomes trivial. And I checked, we made it available to user space, even if the original reason for that code was kernel-only autofs use: you just need to make the pipe be O_DIRECT. This overly stupid program shows off the feature: #define _GNU_SOURCE #include <fcntl.h> #include <unistd.h> int main(int argc, char **argv) { int fd[2]; char buf[10]; pipe2(fd, O_DIRECT | O_NONBLOCK); write(fd[1], "hello", 5); write(fd[1], "hi", 2); read(fd[0], buf, sizeof(buf)); read(fd[0], buf, sizeof(buf)); return 0; } and it you strace it (because I was too lazy to add error handling or printing of results), you'll see write(4, "hello", 5) = 5 write(4, "hi", 2) = 2 read(3, "hello", 10) = 5 read(3, "hi", 10) = 2 note how you got packets of data on the reader side, instead of getting the traditional "just buffer it as a stream". So now you can even have multiple readers of the same event pipe, and packetization is obvious and trivial. Of course, I'm not sure why you'd want to have multiple readers, and you'd lose _ordering_, but if all events are independent, this _might_ be a useful thing in a threaded environment. Maybe. (Side note: a zero-sized write will not cause a zero-sized packet. It will just be dropped). Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 15:53 ` Linus Torvalds @ 2019-09-06 16:12 ` Steven Whitehouse -1 siblings, 0 replies; 234+ messages in thread From: Steven Whitehouse @ 2019-09-06 16:12 UTC (permalink / raw) To: Linus Torvalds, David Howells Cc: Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Hi, On 06/09/2019 16:53, Linus Torvalds wrote: > On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> This is why I like pipes. You can use them today. They are simple, and >> extensible, and you don't need to come up with a new subsystem and >> some untested ad-hoc thing that nobody has actually used. > The only _real_ complexity is to make sure that events are reliably parseable. > > That's where you really want to use the Linux-only "packet pipe" > thing, becasue otherwise you have to have size markers or other things > to delineate events. But if you do that, then it really becomes > trivial. > > And I checked, we made it available to user space, even if the > original reason for that code was kernel-only autofs use: you just > need to make the pipe be O_DIRECT. > > This overly stupid program shows off the feature: > > #define _GNU_SOURCE > #include <fcntl.h> > #include <unistd.h> > > int main(int argc, char **argv) > { > int fd[2]; > char buf[10]; > > pipe2(fd, O_DIRECT | O_NONBLOCK); > write(fd[1], "hello", 5); > write(fd[1], "hi", 2); > read(fd[0], buf, sizeof(buf)); > read(fd[0], buf, sizeof(buf)); > return 0; > } > > and it you strace it (because I was too lazy to add error handling or > printing of results), you'll see > > write(4, "hello", 5) = 5 > write(4, "hi", 2) = 2 > read(3, "hello", 10) = 5 > read(3, "hi", 10) = 2 > > note how you got packets of data on the reader side, instead of > getting the traditional "just buffer it as a stream". > > So now you can even have multiple readers of the same event pipe, and > packetization is obvious and trivial. Of course, I'm not sure why > you'd want to have multiple readers, and you'd lose _ordering_, but if > all events are independent, this _might_ be a useful thing in a > threaded environment. Maybe. > > (Side note: a zero-sized write will not cause a zero-sized packet. It > will just be dropped). > > Linus The events are generally not independent - we would need ordering either implicit in the protocol or explicit in the messages. We also need to know in case messages are dropped too - doesn't need to be anything fancy, just some idea that since we last did a read, there are messages that got lost, most likely due to buffer overrun. That is why the initial idea was to use netlink, since it solves a lot of those issues. The downside was that the indirect nature of the netlink sockets resulted in making it tricky to know the namespace of the process to which the message was to be delivered (and hence whether it should be delivered at all), Steve. ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 16:12 ` Steven Whitehouse 0 siblings, 0 replies; 234+ messages in thread From: Steven Whitehouse @ 2019-09-06 16:12 UTC (permalink / raw) To: Linus Torvalds, David Howells Cc: Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Hi, On 06/09/2019 16:53, Linus Torvalds wrote: > On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> This is why I like pipes. You can use them today. They are simple, and >> extensible, and you don't need to come up with a new subsystem and >> some untested ad-hoc thing that nobody has actually used. > The only _real_ complexity is to make sure that events are reliably parseable. > > That's where you really want to use the Linux-only "packet pipe" > thing, becasue otherwise you have to have size markers or other things > to delineate events. But if you do that, then it really becomes > trivial. > > And I checked, we made it available to user space, even if the > original reason for that code was kernel-only autofs use: you just > need to make the pipe be O_DIRECT. > > This overly stupid program shows off the feature: > > #define _GNU_SOURCE > #include <fcntl.h> > #include <unistd.h> > > int main(int argc, char **argv) > { > int fd[2]; > char buf[10]; > > pipe2(fd, O_DIRECT | O_NONBLOCK); > write(fd[1], "hello", 5); > write(fd[1], "hi", 2); > read(fd[0], buf, sizeof(buf)); > read(fd[0], buf, sizeof(buf)); > return 0; > } > > and it you strace it (because I was too lazy to add error handling or > printing of results), you'll see > > write(4, "hello", 5) = 5 > write(4, "hi", 2) = 2 > read(3, "hello", 10) = 5 > read(3, "hi", 10) = 2 > > note how you got packets of data on the reader side, instead of > getting the traditional "just buffer it as a stream". > > So now you can even have multiple readers of the same event pipe, and > packetization is obvious and trivial. Of course, I'm not sure why > you'd want to have multiple readers, and you'd lose _ordering_, but if > all events are independent, this _might_ be a useful thing in a > threaded environment. Maybe. > > (Side note: a zero-sized write will not cause a zero-sized packet. It > will just be dropped). > > Linus The events are generally not independent - we would need ordering either implicit in the protocol or explicit in the messages. We also need to know in case messages are dropped too - doesn't need to be anything fancy, just some idea that since we last did a read, there are messages that got lost, most likely due to buffer overrun. That is why the initial idea was to use netlink, since it solves a lot of those issues. The downside was that the indirect nature of the netlink sockets resulted in making it tricky to know the namespace of the process to which the message was to be delivered (and hence whether it should be delivered at all), Steve. ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 16:12 ` Steven Whitehouse @ 2019-09-06 17:07 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 17:07 UTC (permalink / raw) To: Steven Whitehouse Cc: David Howells, Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Fri, Sep 6, 2019 at 9:12 AM Steven Whitehouse <swhiteho@redhat.com> wrote: > > The events are generally not independent - we would need ordering either > implicit in the protocol or explicit in the messages. Note that pipes certainly would never re-order messages. It's just that _if_ you have two independent and concurrent readers of the same pipe, they could read one message each, and you couldn't tell which was first in user space. Of course, I would suggest that anything that actually has non-independent messages should always use a sequence number or something like that in the message anyway. But then it would have to be on a protocol level. And it's not clear that all notifications need it. If it's just a random "things changed" notification, mnaybe that doesn't need a sequence number or anything else. So being on a protocol/data stream level might be the right thing regardless. Another possibility is to just say "don't do that then". If you want multiple concurrent readers, open multiple pipes for them and use separate events, and be happy in the knowledge that you don't have any complicated cases. > We also need to > know in case messages are dropped too - doesn't need to be anything > fancy, just some idea that since we last did a read, there are messages > that got lost, most likely due to buffer overrun. Pipes don't have that, but another flag certainly wouldn't be _hard_ to add. But one problem (and this is fundamental) is that while O_DIRECT works today (and works with kernels going back years), any new features like overflow notification would obviously not work with legacy kernels. On the user write side, with an O_NONBLOCK pipe, you currently just get an -EAGAIN, so you _see_ the drop happening. But (again) there's no sticky flag for it anywhere else, and there's no clean automatic way for the reader to see "ok, the writer overflowed". That's not a problem for any future extensions - the feature sounds like a new flag and a couple of lines to do it - but it's a problem for the whole "prototype in user space using existing pipe support" that I personally find so nice, and which I think is such a good way to prove the user space _need_ for anything like this. But if people are ok with the pipe model in theory, _that_ kind of small and directed feature I have absolutely no problem with adding. It's just whole new untested character mode drivers with odd semantics that I find troublesome. Hmm. Maybe somebody can come up with a good legacy signaling solution (and "just use another pipe for error notification and OOB data" for the first one may _work_, but that sounds pretty hacky and just not very convenient). Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 17:07 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 17:07 UTC (permalink / raw) To: Steven Whitehouse Cc: David Howells, Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Fri, Sep 6, 2019 at 9:12 AM Steven Whitehouse <swhiteho@redhat.com> wrote: > > The events are generally not independent - we would need ordering either > implicit in the protocol or explicit in the messages. Note that pipes certainly would never re-order messages. It's just that _if_ you have two independent and concurrent readers of the same pipe, they could read one message each, and you couldn't tell which was first in user space. Of course, I would suggest that anything that actually has non-independent messages should always use a sequence number or something like that in the message anyway. But then it would have to be on a protocol level. And it's not clear that all notifications need it. If it's just a random "things changed" notification, mnaybe that doesn't need a sequence number or anything else. So being on a protocol/data stream level might be the right thing regardless. Another possibility is to just say "don't do that then". If you want multiple concurrent readers, open multiple pipes for them and use separate events, and be happy in the knowledge that you don't have any complicated cases. > We also need to > know in case messages are dropped too - doesn't need to be anything > fancy, just some idea that since we last did a read, there are messages > that got lost, most likely due to buffer overrun. Pipes don't have that, but another flag certainly wouldn't be _hard_ to add. But one problem (and this is fundamental) is that while O_DIRECT works today (and works with kernels going back years), any new features like overflow notification would obviously not work with legacy kernels. On the user write side, with an O_NONBLOCK pipe, you currently just get an -EAGAIN, so you _see_ the drop happening. But (again) there's no sticky flag for it anywhere else, and there's no clean automatic way for the reader to see "ok, the writer overflowed". That's not a problem for any future extensions - the feature sounds like a new flag and a couple of lines to do it - but it's a problem for the whole "prototype in user space using existing pipe support" that I personally find so nice, and which I think is such a good way to prove the user space _need_ for anything like this. But if people are ok with the pipe model in theory, _that_ kind of small and directed feature I have absolutely no problem with adding. It's just whole new untested character mode drivers with odd semantics that I find troublesome. Hmm. Maybe somebody can come up with a good legacy signaling solution (and "just use another pipe for error notification and OOB data" for the first one may _work_, but that sounds pretty hacky and just not very convenient). Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 17:07 ` Linus Torvalds @ 2019-09-06 17:14 ` Linus Torvalds -1 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 17:14 UTC (permalink / raw) To: Steven Whitehouse Cc: David Howells, Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Fri, Sep 6, 2019 at 10:07 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Hmm. Maybe somebody can come up with a good legacy signaling solution > (and "just use another pipe for error notification and OOB data" for > the first one may _work_, but that sounds pretty hacky and just not > very convenient). ... actually, maybe the trivial solution for at least some prototyping cases is to make any user mode writers never drop messages. Don't use a non-blocking fd for the write direction. That's obviously *not* acceptable for a kernel writer, and it's not acceptable for an actual system daemon writer (that you could block by just not reading the notifications), but it's certainly acceptable for the "let's prototype having kernel support for /proc/mounts notifications using a local thread that just polls for it every few seconds". So at least for _some_ prototypes you can probably just ignore the overflow issue. It won't get you full test coverage, but it will get you a working legacy solution and a "look, if we have kernel support for this, we can do better". Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 17:14 ` Linus Torvalds 0 siblings, 0 replies; 234+ messages in thread From: Linus Torvalds @ 2019-09-06 17:14 UTC (permalink / raw) To: Steven Whitehouse Cc: David Howells, Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood On Fri, Sep 6, 2019 at 10:07 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Hmm. Maybe somebody can come up with a good legacy signaling solution > (and "just use another pipe for error notification and OOB data" for > the first one may _work_, but that sounds pretty hacky and just not > very convenient). ... actually, maybe the trivial solution for at least some prototyping cases is to make any user mode writers never drop messages. Don't use a non-blocking fd for the write direction. That's obviously *not* acceptable for a kernel writer, and it's not acceptable for an actual system daemon writer (that you could block by just not reading the notifications), but it's certainly acceptable for the "let's prototype having kernel support for /proc/mounts notifications using a local thread that just polls for it every few seconds". So at least for _some_ prototypes you can probably just ignore the overflow issue. It won't get you full test coverage, but it will get you a working legacy solution and a "look, if we have kernel support for this, we can do better". Linus ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 17:14 ` Linus Torvalds @ 2019-09-06 21:19 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-06 21:19 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: dhowells, Linus Torvalds, Steven Whitehouse, Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Theodore Y. Ts'o <tytso@mit.edu> wrote: > Something else which we should consider up front is how to handle the > case where you have multiple userspace processes that want to > subscribe to the same notification. I have that. > This also implies that we'll need to have some kind of standard header > at the beginning to specify the source of a particular notification > message. That too. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 21:19 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-06 21:19 UTC (permalink / raw) To: Theodore Y. Ts'o Cc: dhowells, Linus Torvalds, Steven Whitehouse, Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood Theodore Y. Ts'o <tytso@mit.edu> wrote: > Something else which we should consider up front is how to handle the > case where you have multiple userspace processes that want to > subscribe to the same notification. I have that. > This also implies that we'll need to have some kind of standard header > at the beginning to specify the source of a particular notification > message. That too. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-06 16:12 ` Steven Whitehouse @ 2019-09-06 17:14 ` Andy Lutomirski -1 siblings, 0 replies; 234+ messages in thread From: Andy Lutomirski @ 2019-09-06 17:14 UTC (permalink / raw) To: Steven Whitehouse Cc: Linus Torvalds, David Howells, Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood > On Sep 6, 2019, at 9:12 AM, Steven Whitehouse <swhiteho@redhat.com> wrote: > > Hi, > >> On 06/09/2019 16:53, Linus Torvalds wrote: >> On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >>> This is why I like pipes. You can use them today. They are simple, and >>> extensible, and you don't need to come up with a new subsystem and >>> some untested ad-hoc thing that nobody has actually used. >> The only _real_ complexity is to make sure that events are reliably parseable. >> >> That's where you really want to use the Linux-only "packet pipe" >> thing, becasue otherwise you have to have size markers or other things >> to delineate events. But if you do that, then it really becomes >> trivial. >> >> And I checked, we made it available to user space, even if the >> original reason for that code was kernel-only autofs use: you just >> need to make the pipe be O_DIRECT. >> >> This overly stupid program shows off the feature: >> >> #define _GNU_SOURCE >> #include <fcntl.h> >> #include <unistd.h> >> >> int main(int argc, char **argv) >> { >> int fd[2]; >> char buf[10]; >> >> pipe2(fd, O_DIRECT | O_NONBLOCK); >> write(fd[1], "hello", 5); >> write(fd[1], "hi", 2); >> read(fd[0], buf, sizeof(buf)); >> read(fd[0], buf, sizeof(buf)); >> return 0; >> } >> >> and it you strace it (because I was too lazy to add error handling or >> printing of results), you'll see >> >> write(4, "hello", 5) = 5 >> write(4, "hi", 2) = 2 >> read(3, "hello", 10) = 5 >> read(3, "hi", 10) = 2 >> >> note how you got packets of data on the reader side, instead of >> getting the traditional "just buffer it as a stream". >> >> So now you can even have multiple readers of the same event pipe, and >> packetization is obvious and trivial. Of course, I'm not sure why >> you'd want to have multiple readers, and you'd lose _ordering_, but if >> all events are independent, this _might_ be a useful thing in a >> threaded environment. Maybe. >> >> (Side note: a zero-sized write will not cause a zero-sized packet. It >> will just be dropped). >> >> Linus > > The events are generally not independent - we would need ordering either implicit in the protocol or explicit in the messages. We also need to know in case messages are dropped too - doesn't need to be anything fancy, just some idea that since we last did a read, there are messages that got lost, most likely due to buffer overrun. This could be a bit fancier: if the pipe recorded the bitwise or of the first few bytes of dropped message, then the messages could set a bit in the header indicating the type, and readers could then learn which *types* of messages were dropped. Or they could just use multiple pipes. If this whole mechanism catches on, I wonder if implementing recvmmsg() on pipes would be worthwhile. ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-06 17:14 ` Andy Lutomirski 0 siblings, 0 replies; 234+ messages in thread From: Andy Lutomirski @ 2019-09-06 17:14 UTC (permalink / raw) To: Steven Whitehouse Cc: Linus Torvalds, David Howells, Ray Strode, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Al Viro, Ray, Debarshi, Robbie Harwood > On Sep 6, 2019, at 9:12 AM, Steven Whitehouse <swhiteho@redhat.com> wrote: > > Hi, > >> On 06/09/2019 16:53, Linus Torvalds wrote: >> On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >>> This is why I like pipes. You can use them today. They are simple, and >>> extensible, and you don't need to come up with a new subsystem and >>> some untested ad-hoc thing that nobody has actually used. >> The only _real_ complexity is to make sure that events are reliably parseable. >> >> That's where you really want to use the Linux-only "packet pipe" >> thing, becasue otherwise you have to have size markers or other things >> to delineate events. But if you do that, then it really becomes >> trivial. >> >> And I checked, we made it available to user space, even if the >> original reason for that code was kernel-only autofs use: you just >> need to make the pipe be O_DIRECT. >> >> This overly stupid program shows off the feature: >> >> #define _GNU_SOURCE >> #include <fcntl.h> >> #include <unistd.h> >> >> int main(int argc, char **argv) >> { >> int fd[2]; >> char buf[10]; >> >> pipe2(fd, O_DIRECT | O_NONBLOCK); >> write(fd[1], "hello", 5); >> write(fd[1], "hi", 2); >> read(fd[0], buf, sizeof(buf)); >> read(fd[0], buf, sizeof(buf)); >> return 0; >> } >> >> and it you strace it (because I was too lazy to add error handling or >> printing of results), you'll see >> >> write(4, "hello", 5) = 5 >> write(4, "hi", 2) = 2 >> read(3, "hello", 10) = 5 >> read(3, "hi", 10) = 2 >> >> note how you got packets of data on the reader side, instead of >> getting the traditional "just buffer it as a stream". >> >> So now you can even have multiple readers of the same event pipe, and >> packetization is obvious and trivial. Of course, I'm not sure why >> you'd want to have multiple readers, and you'd lose _ordering_, but if >> all events are independent, this _might_ be a useful thing in a >> threaded environment. Maybe. >> >> (Side note: a zero-sized write will not cause a zero-sized packet. It >> will just be dropped). >> >> Linus > > The events are generally not independent - we would need ordering either implicit in the protocol or explicit in the messages. We also need to know in case messages are dropped too - doesn't need to be anything fancy, just some idea that since we last did a read, there are messages that got lost, most likely due to buffer overrun. This could be a bit fancier: if the pipe recorded the bitwise or of the first few bytes of dropped message, then the messages could set a bit in the header indicating the type, and readers could then learn which *types* of messages were dropped. Or they could just use multiple pipes. If this whole mechanism catches on, I wonder if implementing recvmmsg() on pipes would be worthwhile. ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 17:19 ` Linus Torvalds @ 2019-09-05 18:37 ` Steven Whitehouse -1 siblings, 0 replies; 234+ messages in thread From: Steven Whitehouse @ 2019-09-05 18:37 UTC (permalink / raw) To: Linus Torvalds, David Howells Cc: Greg Kroah-Hartman, rstrode, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, David Lehman, Ian Kent Hi, On 05/09/2019 18:19, Linus Torvalds wrote: > On Thu, Sep 5, 2019 at 10:01 AM David Howells <dhowells@redhat.com> wrote: >>> I'm just going to be very blunt about this, and say that there is no >>> way I can merge any of this *ever*, unless other people stand up and >>> say that >>> >>> (a) they'll use it >>> >>> and >>> >>> (b) they'll actively develop it and participate in testing and coding >> Besides the core notification buffer which ties this together, there are a >> number of sources that I've implemented, not all of which are in this patch >> series: > You've at least now answered part of the "Why", but you didn't > actually answer the whole "another developer" part. > > I really don't like how nobody else than you seems to even look at any > of the key handling patches. Because nobody else seems to care. > > This seems to be another new subsystem / driver that has the same > pattern. If it's all just you, I don't want to merge it, because I > really want more than just other developers doing "Reviewed-by" after > looking at somebody elses code that they don't actually use or really > care about. > > See what I'm saying? > > New features that go into the kernel should have multiple users. Not a > single developer who pushes both the kernel feature and the single use > of that feature. > > This very much comes from me reverting the key ACL pull. Not only did > I revert it, ABSOLUTELY NOBODY even reacted to the revert. Nobody > stepped up and said they they want that new ACL code, and pushed for a > fix. There was some very little murmuring about it when Mimi at least > figured out _why_ it broke, but other than that all the noise I saw > about the revert was Eric Biggers pointing out it broke other things > too, and that it had actually broken some test suites. But since it > hadn't even been in linux-next, that too had been noticed much too > late. > > See what I'm saying? This whole "David Howells does his own features > that nobody else uses" needs to stop. You need to have a champion. I > just don't feel safe pulling these kinds of changes from you, because > I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it > or really cared. > > Most of the patches has nobody else even Cc'd, and even the patches > that do have some "Reviewed-by" feel more like somebody else went "ok, > the change looks fine to me", without any other real attachment to the > code. > > New kernel features and interfaces really need to have a higher > barrier of entry than one developer working on his or her own thing. > > Is that a change from 25 years ago? Or yes it is. We can point to lots > of "single developer did a thing" from years past. But things have > changed. And once bitten, twice shy: I really am a _lot_ more nervous > about all these key changes now. > > Linus There are a number of potential users, some waiting just to have a mechanism to avoid the racy alternatives to (for example) parsing /proc/mounts repeatedly, others perhaps a bit further away, but who have nonetheless expressed interest in having an interface which allows notifications for mounts. The subject of mount notifications has been discussed at LSF/MM in the past too, I proposed it as a topic a little while back: https://www.spinics.net/lists/linux-block/msg07653.html and David's patch set is a potential solution to some of the issues that I raised there. The original series for the new mount API came from an idea of Al/Miklos which was also presented at LSF/MM 2017, and this is a follow on project. So it has not come out of nowhere, but has been something that has been discussed in various forums over a period of time. Originally, there was a proposal to use netlink for the notifications, however that didn't seem to meet with general approval, even though Ian Kent did some work towards figuring out whether that would be a useful direction to go in. David has since come up with the proposal presented here, which is intended to improve on the original proposal in various ways - mostly making the notifications more efficient (i.e. smaller) and also generic enough that it might have uses beyond the original intent of just being a mount notification mechanism. The original reason for the mount notification mechanism was so that we are able to provide information to GUIs and similar filesystem and storage management tools, matching the state of the filesystem with the state of the underlying devices. This is part of a larger project entitled "Project Springfield" to try and provide better management tools for storage and filesystems. I've copied David Lehman in, since he can provide a wider view on this topic. It is something that I do expect will receive wide use, and which will be tested carefully. I know that Ian Kent has started work on some support for libmount for example, even outside of autofs. We do regularly hear from customers that better storage and filesystem management tools are something that they consider very important, so that is why we are spending such a lot of effort in trying to improve the support in this area. I'm not sure if that really answers your question, except to say that it is something that is much more than a personal project of David's and that other people do care about it too, Steve. ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 18:37 ` Steven Whitehouse 0 siblings, 0 replies; 234+ messages in thread From: Steven Whitehouse @ 2019-09-05 18:37 UTC (permalink / raw) To: Linus Torvalds, David Howells Cc: Greg Kroah-Hartman, rstrode, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, David Lehman, Ian Kent Hi, On 05/09/2019 18:19, Linus Torvalds wrote: > On Thu, Sep 5, 2019 at 10:01 AM David Howells <dhowells@redhat.com> wrote: >>> I'm just going to be very blunt about this, and say that there is no >>> way I can merge any of this *ever*, unless other people stand up and >>> say that >>> >>> (a) they'll use it >>> >>> and >>> >>> (b) they'll actively develop it and participate in testing and coding >> Besides the core notification buffer which ties this together, there are a >> number of sources that I've implemented, not all of which are in this patch >> series: > You've at least now answered part of the "Why", but you didn't > actually answer the whole "another developer" part. > > I really don't like how nobody else than you seems to even look at any > of the key handling patches. Because nobody else seems to care. > > This seems to be another new subsystem / driver that has the same > pattern. If it's all just you, I don't want to merge it, because I > really want more than just other developers doing "Reviewed-by" after > looking at somebody elses code that they don't actually use or really > care about. > > See what I'm saying? > > New features that go into the kernel should have multiple users. Not a > single developer who pushes both the kernel feature and the single use > of that feature. > > This very much comes from me reverting the key ACL pull. Not only did > I revert it, ABSOLUTELY NOBODY even reacted to the revert. Nobody > stepped up and said they they want that new ACL code, and pushed for a > fix. There was some very little murmuring about it when Mimi at least > figured out _why_ it broke, but other than that all the noise I saw > about the revert was Eric Biggers pointing out it broke other things > too, and that it had actually broken some test suites. But since it > hadn't even been in linux-next, that too had been noticed much too > late. > > See what I'm saying? This whole "David Howells does his own features > that nobody else uses" needs to stop. You need to have a champion. I > just don't feel safe pulling these kinds of changes from you, because > I get the feeling that ABSOLUTELY NOBODY ELSE ever really looked at it > or really cared. > > Most of the patches has nobody else even Cc'd, and even the patches > that do have some "Reviewed-by" feel more like somebody else went "ok, > the change looks fine to me", without any other real attachment to the > code. > > New kernel features and interfaces really need to have a higher > barrier of entry than one developer working on his or her own thing. > > Is that a change from 25 years ago? Or yes it is. We can point to lots > of "single developer did a thing" from years past. But things have > changed. And once bitten, twice shy: I really am a _lot_ more nervous > about all these key changes now. > > Linus There are a number of potential users, some waiting just to have a mechanism to avoid the racy alternatives to (for example) parsing /proc/mounts repeatedly, others perhaps a bit further away, but who have nonetheless expressed interest in having an interface which allows notifications for mounts. The subject of mount notifications has been discussed at LSF/MM in the past too, I proposed it as a topic a little while back: https://www.spinics.net/lists/linux-block/msg07653.html and David's patch set is a potential solution to some of the issues that I raised there. The original series for the new mount API came from an idea of Al/Miklos which was also presented at LSF/MM 2017, and this is a follow on project. So it has not come out of nowhere, but has been something that has been discussed in various forums over a period of time. Originally, there was a proposal to use netlink for the notifications, however that didn't seem to meet with general approval, even though Ian Kent did some work towards figuring out whether that would be a useful direction to go in. David has since come up with the proposal presented here, which is intended to improve on the original proposal in various ways - mostly making the notifications more efficient (i.e. smaller) and also generic enough that it might have uses beyond the original intent of just being a mount notification mechanism. The original reason for the mount notification mechanism was so that we are able to provide information to GUIs and similar filesystem and storage management tools, matching the state of the filesystem with the state of the underlying devices. This is part of a larger project entitled "Project Springfield" to try and provide better management tools for storage and filesystems. I've copied David Lehman in, since he can provide a wider view on this topic. It is something that I do expect will receive wide use, and which will be tested carefully. I know that Ian Kent has started work on some support for libmount for example, even outside of autofs. We do regularly hear from customers that better storage and filesystem management tools are something that they consider very important, so that is why we are spending such a lot of effort in trying to improve the support in this area. I'm not sure if that really answers your question, except to say that it is something that is much more than a personal project of David's and that other people do care about it too, Steve. ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 18:37 ` Steven Whitehouse @ 2019-09-05 18:51 ` Ray Strode -1 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-05 18:51 UTC (permalink / raw) To: Steven Whitehouse Cc: Linus Torvalds, David Howells, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, David Lehman, Ian Kent Hi, On Thu, Sep 5, 2019 at 2:37 PM Steven Whitehouse <swhiteho@redhat.com> wrote: > The original reason for the mount notification mechanism was so that we > are able to provide information to GUIs and similar filesystem and > storage management tools, matching the state of the filesystem with the > state of the underlying devices. This is part of a larger project > entitled "Project Springfield" to try and provide better management > tools for storage and filesystems. I've copied David Lehman in, since he > can provide a wider view on this topic. So one problem that I've heard discussed before is what happens in a thinp setup when the disk space is overallocated and gets used up. IIRC, the volumes just sort of eat themselves? Getting proper notification of looming catastrophic failure to the workstation user before it's too late would be useful, indeed. I don't know if this new mechanism dhowells has development can help with that, and/or if solving that problem is part of the Project Springfield initiative or not. Do you know off hand? --Ray ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 18:51 ` Ray Strode 0 siblings, 0 replies; 234+ messages in thread From: Ray Strode @ 2019-09-05 18:51 UTC (permalink / raw) To: Steven Whitehouse Cc: Linus Torvalds, David Howells, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, David Lehman, Ian Kent Hi, On Thu, Sep 5, 2019 at 2:37 PM Steven Whitehouse <swhiteho@redhat.com> wrote: > The original reason for the mount notification mechanism was so that we > are able to provide information to GUIs and similar filesystem and > storage management tools, matching the state of the filesystem with the > state of the underlying devices. This is part of a larger project > entitled "Project Springfield" to try and provide better management > tools for storage and filesystems. I've copied David Lehman in, since he > can provide a wider view on this topic. So one problem that I've heard discussed before is what happens in a thinp setup when the disk space is overallocated and gets used up. IIRC, the volumes just sort of eat themselves? Getting proper notification of looming catastrophic failure to the workstation user before it's too late would be useful, indeed. I don't know if this new mechanism dhowells has development can help with that, and/or if solving that problem is part of the Project Springfield initiative or not. Do you know off hand? --Ray ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 18:51 ` Ray Strode @ 2019-09-05 20:09 ` David Lehman -1 siblings, 0 replies; 234+ messages in thread From: David Lehman @ 2019-09-05 20:09 UTC (permalink / raw) To: Ray Strode, Steven Whitehouse Cc: Linus Torvalds, David Howells, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Ian Kent On Thu, 2019-09-05 at 14:51 -0400, Ray Strode wrote: > Hi, > > On Thu, Sep 5, 2019 at 2:37 PM Steven Whitehouse <swhiteho@redhat.com > > wrote: > > The original reason for the mount notification mechanism was so > > that we > > are able to provide information to GUIs and similar filesystem and > > storage management tools, matching the state of the filesystem with > > the > > state of the underlying devices. This is part of a larger project > > entitled "Project Springfield" to try and provide better management > > tools for storage and filesystems. I've copied David Lehman in, > > since he > > can provide a wider view on this topic. > So one problem that I've heard discussed before is what happens in a > thinp > setup when the disk space is overallocated and gets used up. IIRC, > the > volumes just sort of eat themselves? > > Getting proper notification of looming catastrophic failure to the > workstation user > before it's too late would be useful, indeed. > > I don't know if this new mechanism dhowells has development can help > with that, My understanding is that there is already a dm devent that gets sent when the low water mark is crossed for a thin pool, but there is nothing in userspace that knows how to effectively get the user's attention at that time. > and/or if solving that problem is part of the Project Springfield > initiative or not. Do you > know off hand? We have been looking into building a userspace event notification service (for storage, initially) to aggregate and add context to low- level events such as these, providing a single source for all kinds of storage events with an excellent signal:noise ratio. Thin pool exhaustion is high on the list of problems we would want to address. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 20:09 ` David Lehman 0 siblings, 0 replies; 234+ messages in thread From: David Lehman @ 2019-09-05 20:09 UTC (permalink / raw) To: Ray Strode, Steven Whitehouse Cc: Linus Torvalds, David Howells, Greg Kroah-Hartman, Nicolas Dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing, Ian Kent On Thu, 2019-09-05 at 14:51 -0400, Ray Strode wrote: > Hi, > > On Thu, Sep 5, 2019 at 2:37 PM Steven Whitehouse <swhiteho@redhat.com > > wrote: > > The original reason for the mount notification mechanism was so > > that we > > are able to provide information to GUIs and similar filesystem and > > storage management tools, matching the state of the filesystem with > > the > > state of the underlying devices. This is part of a larger project > > entitled "Project Springfield" to try and provide better management > > tools for storage and filesystems. I've copied David Lehman in, > > since he > > can provide a wider view on this topic. > So one problem that I've heard discussed before is what happens in a > thinp > setup when the disk space is overallocated and gets used up. IIRC, > the > volumes just sort of eat themselves? > > Getting proper notification of looming catastrophic failure to the > workstation user > before it's too late would be useful, indeed. > > I don't know if this new mechanism dhowells has development can help > with that, My understanding is that there is already a dm devent that gets sent when the low water mark is crossed for a thin pool, but there is nothing in userspace that knows how to effectively get the user's attention at that time. > and/or if solving that problem is part of the Project Springfield > initiative or not. Do you > know off hand? We have been looking into building a userspace event notification service (for storage, initially) to aggregate and add context to low- level events such as these, providing a single source for all kinds of storage events with an excellent signal:noise ratio. Thin pool exhaustion is high on the list of problems we would want to address. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources 2019-09-05 17:01 ` David Howells @ 2019-09-05 18:33 ` Greg Kroah-Hartman -1 siblings, 0 replies; 234+ messages in thread From: Greg Kroah-Hartman @ 2019-09-05 18:33 UTC (permalink / raw) To: David Howells Cc: Linus Torvalds, rstrode, swhiteho, nicolas.dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing On Thu, Sep 05, 2019 at 06:01:47PM +0100, David Howells wrote: > (2) USB notifications. > > GregKH was looking for a way to do USB notifications as I was looking to > find additional sources to implement. I'm not sure how he wants to use > them, but I'll let him speak to that himself. We are getting people asking for all sorts of "error reporting" events that can happen in the USB subsystem that we have started to abuse the KOBJ_CHANGE uevent notification for. At the same time your patches were submitted, someone else submitted yet-another-USB-error patchset. This type of user/kernel interface is much easier to use than abusing uevents for USB errors and general notifications about what happened with USB devices (more than just add/remove that uevents have). So yes, I would like this, and I am sure the ChromeOS people would like it too given that I rejected their patcheset with the assumption that this could be done with the notification queue api "soon" :) thanks, greg k-h ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: Why add the general notification queue and its sources @ 2019-09-05 18:33 ` Greg Kroah-Hartman 0 siblings, 0 replies; 234+ messages in thread From: Greg Kroah-Hartman @ 2019-09-05 18:33 UTC (permalink / raw) To: David Howells Cc: Linus Torvalds, rstrode, swhiteho, nicolas.dichtel, raven, keyrings, linux-usb, linux-block, Christian Brauner, LSM List, linux-fsdevel, Linux API, Linux List Kernel Mailing On Thu, Sep 05, 2019 at 06:01:47PM +0100, David Howells wrote: > (2) USB notifications. > > GregKH was looking for a way to do USB notifications as I was looking to > find additional sources to implement. I'm not sure how he wants to use > them, but I'll let him speak to that himself. We are getting people asking for all sorts of "error reporting" events that can happen in the USB subsystem that we have started to abuse the KOBJ_CHANGE uevent notification for. At the same time your patches were submitted, someone else submitted yet-another-USB-error patchset. This type of user/kernel interface is much easier to use than abusing uevents for USB errors and general notifications about what happened with USB devices (more than just add/remove that uevents have). So yes, I would like this, and I am sure the ChromeOS people would like it too given that I rejected their patcheset with the assumption that this could be done with the notification queue api "soon" :) thanks, greg k-h ^ permalink raw reply [flat|nested] 234+ messages in thread
[parent not found: <20190903085706.7700-1-hdanton@sina.com>]
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-30 13:57 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Here's a set of patches to add a general notification queue concept and to add sources of events for: (1) Key/keyring events, such as creating, linking and removal of keys. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device/bus attach/remove, device reset, device errors. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for watch_queue(7) and the watch_devices(2) system call (these should be transferred to the manpages package if taken upstream). LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided a preliminary attempt to provide SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written (though there is a facility to use write to inject records for debugging) and userspace just pulls data directly out of the buffer. (2) The ring index pointers are stored inside the ring and are thus accessible to userspace. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. A 'skip' record is maintained around the pointers. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This means that multiple heterogeneous sources can share a common buffer. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) The queue is reusable as there are 16 million types available, of which I've used just a few, so there is scope for others to be used. (6) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (7) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (8) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. (9) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (10) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (11) The watch must be removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #7: (*) Removed the 'watch' argument from the security_watch_key() and security_watch_devices() hooks as current_cred() can be used instead of watch->cred. ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks [untested] Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 893 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 56 ++ drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 38 + include/linux/security.h | 32 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 183 ++++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 233 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 99 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 23 + security/selinux/hooks.c | 14 security/smack/smack_lsm.c | 82 ++ 58 files changed, 2593 insertions(+), 30 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Here's a set of patches to add a general notification queue concept and to add sources of events for: (1) Key/keyring events, such as creating, linking and removal of keys. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device/bus attach/remove, device reset, device errors. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for watch_queue(7) and the watch_devices(2) system call (these should be transferred to the manpages package if taken upstream). LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided a preliminary attempt to provide SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written (though there is a facility to use write to inject records for debugging) and userspace just pulls data directly out of the buffer. (2) The ring index pointers are stored inside the ring and are thus accessible to userspace. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. A 'skip' record is maintained around the pointers. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This means that multiple heterogeneous sources can share a common buffer. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) The queue is reusable as there are 16 million types available, of which I've used just a few, so there is scope for others to be used. (6) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (7) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (8) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. (9) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (10) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (11) The watch must be removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #7: (*) Removed the 'watch' argument from the security_watch_key() and security_watch_devices() hooks as current_cred() can be used instead of watch->cred. ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks [untested] Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 893 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 56 ++ drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 38 + include/linux/security.h | 32 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 183 ++++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 233 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 99 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 23 + security/selinux/hooks.c | 14 security/smack/smack_lsm.c | 82 ++ 58 files changed, 2593 insertions(+), 30 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Here's a set of patches to add a general notification queue concept and to add sources of events for: (1) Key/keyring events, such as creating, linking and removal of keys. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device/bus attach/remove, device reset, device errors. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for watch_queue(7) and the watch_devices(2) system call (these should be transferred to the manpages package if taken upstream). LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided a preliminary attempt to provide SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written (though there is a facility to use write to inject records for debugging) and userspace just pulls data directly out of the buffer. (2) The ring index pointers are stored inside the ring and are thus accessible to userspace. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. A 'skip' record is maintained around the pointers. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This means that multiple heterogeneous sources can share a common buffer. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) The queue is reusable as there are 16 million types available, of which I've used just a few, so there is scope for others to be used. (6) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (7) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (8) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. (9) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (10) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (11) The watch must be removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #7: (*) Removed the 'watch' argument from the security_watch_key() and security_watch_devices() hooks as current_cred() can be used instead of watch->cred. ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks [untested] Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 893 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 56 ++ drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 38 + include/linux/security.h | 32 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 183 ++++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 233 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 99 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 23 + security/selinux/hooks.c | 14 security/smack/smack_lsm.c | 82 ++ 58 files changed, 2593 insertions(+), 30 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:57 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:57 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 24 ++++++++++++++++++++++++ include/linux/security.h | 17 +++++++++++++++++ security/security.c | 14 ++++++++++++++ 3 files changed, 55 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..b0cdefcda4e6 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,18 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1700,12 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_KEY_NOTIFICATIONS + int (*watch_key)(struct key *key); +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + int (*watch_devices)(void); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,12 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct hlist_head watch_key; +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + struct hlist_head watch_devices; +#endif #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..3be44354d308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1206,6 +1206,23 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 } #endif /* CONFIG_SECURITY */ +#if defined(CONFIG_SECURITY) && defined(CONFIG_KEY_NOTIFICATIONS) +int security_watch_key(struct key *key); +#else +static inline int security_watch_key(struct key *key) +{ + return 0; +} +#endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_DEVICE_NOTIFICATIONS) +int security_watch_devices(void); +#else +static inline int security_watch_devices(void) +{ + return 0; +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk); diff --git a/security/security.c b/security/security.c index 250ee2d76406..007eb48bc848 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,20 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_KEY_NOTIFICATIONS +int security_watch_key(struct key *key) +{ + return call_int_hook(watch_key, 0, key); +} +#endif + +#ifdef CONFIG_DEVICE_NOTIFICATIONS +int security_watch_devices(void) +{ + return call_int_hook(watch_devices, 0); +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 24 ++++++++++++++++++++++++ include/linux/security.h | 17 +++++++++++++++++ security/security.c | 14 ++++++++++++++ 3 files changed, 55 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..b0cdefcda4e6 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,18 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1700,12 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_KEY_NOTIFICATIONS + int (*watch_key)(struct key *key); +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + int (*watch_devices)(void); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,12 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct hlist_head watch_key; +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + struct hlist_head watch_devices; +#endif #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..3be44354d308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1206,6 +1206,23 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 } #endif /* CONFIG_SECURITY */ +#if defined(CONFIG_SECURITY) && defined(CONFIG_KEY_NOTIFICATIONS) +int security_watch_key(struct key *key); +#else +static inline int security_watch_key(struct key *key) +{ + return 0; +} +#endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_DEVICE_NOTIFICATIONS) +int security_watch_devices(void); +#else +static inline int security_watch_devices(void) +{ + return 0; +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk); diff --git a/security/security.c b/security/security.c index 250ee2d76406..007eb48bc848 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,20 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_KEY_NOTIFICATIONS +int security_watch_key(struct key *key) +{ + return call_int_hook(watch_key, 0, key); +} +#endif + +#ifdef CONFIG_DEVICE_NOTIFICATIONS +int security_watch_devices(void) +{ + return call_int_hook(watch_devices, 0); +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 24 ++++++++++++++++++++++++ include/linux/security.h | 17 +++++++++++++++++ security/security.c | 14 ++++++++++++++ 3 files changed, 55 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..b0cdefcda4e6 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,18 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1700,12 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_KEY_NOTIFICATIONS + int (*watch_key)(struct key *key); +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + int (*watch_devices)(void); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,12 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct hlist_head watch_key; +#endif +#ifdef CONFIG_DEVICE_NOTIFICATIONS + struct hlist_head watch_devices; +#endif #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..3be44354d308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1206,6 +1206,23 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 } #endif /* CONFIG_SECURITY */ +#if defined(CONFIG_SECURITY) && defined(CONFIG_KEY_NOTIFICATIONS) +int security_watch_key(struct key *key); +#else +static inline int security_watch_key(struct key *key) +{ + return 0; +} +#endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_DEVICE_NOTIFICATIONS) +int security_watch_devices(void); +#else +static inline int security_watch_devices(void) +{ + return 0; +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk); diff --git a/security/security.c b/security/security.c index 250ee2d76406..007eb48bc848 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,20 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_KEY_NOTIFICATIONS +int security_watch_key(struct key *key) +{ + return call_int_hook(watch_key, 0, key); +} +#endif + +#ifdef CONFIG_DEVICE_NOTIFICATIONS +int security_watch_devices(void) +{ + return call_int_hook(watch_devices, 0); +} +#endif + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:57 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 14 ++++++++++++++ include/linux/security.h | 15 ++++++++++++++- security/security.c | 9 +++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index b0cdefcda4e6..257d803dcf6f 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1424,6 +1424,12 @@ * Check to see if a process is allowed to watch for event notifications * from devices (as a global set). * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1706,6 +1712,11 @@ union security_list_options { #ifdef CONFIG_DEVICE_NOTIFICATIONS int (*watch_devices)(void); #endif +#ifdef CONFIG_WATCH_QUEUE + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1988,6 +1999,9 @@ struct security_hook_heads { #ifdef CONFIG_DEVICE_NOTIFICATIONS struct hlist_head watch_devices; #endif +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head post_notification; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 3be44354d308..24c54b9ff0a1 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,8 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -1222,6 +1224,18 @@ static inline int security_watch_devices(void) return 0; } #endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE) +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#else +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} +#endif #ifdef CONFIG_SECURITY_NETWORK @@ -1847,4 +1861,3 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_BPF_SYSCALL */ #endif /* ! __LINUX_SECURITY_H */ - diff --git a/security/security.c b/security/security.c index 007eb48bc848..b719c5a5b2ba 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,15 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_KEY_NOTIFICATIONS int security_watch_key(struct key *key) { ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 14 ++++++++++++++ include/linux/security.h | 15 ++++++++++++++- security/security.c | 9 +++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index b0cdefcda4e6..257d803dcf6f 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1424,6 +1424,12 @@ * Check to see if a process is allowed to watch for event notifications * from devices (as a global set). * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1706,6 +1712,11 @@ union security_list_options { #ifdef CONFIG_DEVICE_NOTIFICATIONS int (*watch_devices)(void); #endif +#ifdef CONFIG_WATCH_QUEUE + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1988,6 +1999,9 @@ struct security_hook_heads { #ifdef CONFIG_DEVICE_NOTIFICATIONS struct hlist_head watch_devices; #endif +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head post_notification; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 3be44354d308..24c54b9ff0a1 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,8 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -1222,6 +1224,18 @@ static inline int security_watch_devices(void) return 0; } #endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE) +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#else +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} +#endif #ifdef CONFIG_SECURITY_NETWORK @@ -1847,4 +1861,3 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_BPF_SYSCALL */ #endif /* ! __LINUX_SECURITY_H */ - diff --git a/security/security.c b/security/security.c index 007eb48bc848..b719c5a5b2ba 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,15 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_KEY_NOTIFICATIONS int security_watch_key(struct key *key) { ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 14 ++++++++++++++ include/linux/security.h | 15 ++++++++++++++- security/security.c | 9 +++++++++ 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index b0cdefcda4e6..257d803dcf6f 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1424,6 +1424,12 @@ * Check to see if a process is allowed to watch for event notifications * from devices (as a global set). * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1706,6 +1712,11 @@ union security_list_options { #ifdef CONFIG_DEVICE_NOTIFICATIONS int (*watch_devices)(void); #endif +#ifdef CONFIG_WATCH_QUEUE + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#endif #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1988,6 +1999,9 @@ struct security_hook_heads { #ifdef CONFIG_DEVICE_NOTIFICATIONS struct hlist_head watch_devices; #endif +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head post_notification; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 3be44354d308..24c54b9ff0a1 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,8 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -1222,6 +1224,18 @@ static inline int security_watch_devices(void) return 0; } #endif +#if defined(CONFIG_SECURITY) && defined(CONFIG_WATCH_QUEUE) +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); +#else +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} +#endif #ifdef CONFIG_SECURITY_NETWORK @@ -1847,4 +1861,3 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_BPF_SYSCALL */ #endif /* ! __LINUX_SECURITY_H */ - diff --git a/security/security.c b/security/security.c index 007eb48bc848..b719c5a5b2ba 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,15 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_KEY_NOTIFICATIONS int security_watch_key(struct key *key) { ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:57 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 893 ++++++++++++++++++++++++++++++++++ include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 7 files changed, 1465 insertions(+) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============================ +Mappable notifications queue +============================ + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +======== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +================ + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +============== + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" == "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +==================================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +===================================== + +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +====================== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +======================== + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +============= + +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=============== + +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) == info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +================== + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +====================== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len == 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..bef58948cf1b --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,893 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + ring_tail = READ_ONCE(buf->meta.tail); + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h == 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h == tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + smp_store_release(&buf->meta.head, head); + if (used == 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type == wt->type && + (wt->subtype_filter[n->subtype >> 5] & + (1U << (n->subtype & 31))) && + (n->info & wt->info_mask) == wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) == 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages == 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages == 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters == 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue = + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue == wq && watch->id == w->id) + return -EBUSY; + } + + watch->cred = get_current_cred(); + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id == id && rcu_access_pointer(watch->queue) == wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op == &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 893 ++++++++++++++++++++++++++++++++++ include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 7 files changed, 1465 insertions(+) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============================ +Mappable notifications queue +============================ + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +======== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +================ + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +============== + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" == "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +==================================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +===================================== + +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +====================== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +======================== + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +============= + +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=============== + +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) == info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +================== + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +====================== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len == 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..bef58948cf1b --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,893 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + ring_tail = READ_ONCE(buf->meta.tail); + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h == 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h == tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + smp_store_release(&buf->meta.head, head); + if (used == 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type == wt->type && + (wt->subtype_filter[n->subtype >> 5] & + (1U << (n->subtype & 31))) && + (n->info & wt->info_mask) == wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) == 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages == 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages == 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters == 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue = + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue == wq && watch->id == w->id) + return -EBUSY; + } + + watch->cred = get_current_cred(); + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id == id && rcu_access_pointer(watch->queue) == wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op == &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 893 ++++++++++++++++++++++++++++++++++ include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 7 files changed, 1465 insertions(+) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============== +Mappable notifications queue +============== + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +==== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +======== + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +======= + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" = "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +==================+ +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +=========== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +============ + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +======+ +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=======+ +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) = info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +========= + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +=========== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) = -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len = 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..bef58948cf1b --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,893 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + ring_tail = READ_ONCE(buf->meta.tail); + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h = 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h = tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + smp_store_release(&buf->meta.head, head); + if (used = 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type = wt->type && + (wt->subtype_filter[n->subtype >> 5] & + (1U << (n->subtype & 31))) && + (n->info & wt->info_mask) = wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) = 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages = 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages = 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters = 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue = wq && watch->id = w->id) + return -EBUSY; + } + + watch->cred = get_current_cred(); + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id = id && rcu_access_pointer(watch->queue) = wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op = &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:57 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 ++++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++- security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 99 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 271 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services =============== diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state == KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at == 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at == 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..6610649514fb 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,90 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_key(key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret == 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + kfree(watch); +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1947,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret == 0) ret = __key_link_check_live_key(keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 ++++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++- security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 99 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 271 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services =============== diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state == KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at == 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at == 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..6610649514fb 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,90 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_key(key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret == 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + kfree(watch); +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1947,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret == 0) ret = __key_link_check_live_key(keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #7] @ 2019-08-30 13:57 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:57 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 ++++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++- security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 99 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 271 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services ======= diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state = KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret = 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret = 0) + if (ret = 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret = 0) + if (ret = 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at = 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at = 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..6610649514fb 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,90 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_key(key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret = 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + kfree(watch); +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1947,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret = 0) ret = __key_link_check_live_key(keyring, key); if (ret = 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret = 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:58 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++- arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 24 files changed, 149 insertions(+), 2 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +======================== + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ============= @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..725aaa24275b --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_devices(); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret == 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + kfree(watch); +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++- arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 24 files changed, 149 insertions(+), 2 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +======================== + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ============= @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..725aaa24275b --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_devices(); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret == 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + kfree(watch); +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++- arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 90 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 24 files changed, 149 insertions(+), 2 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +============ + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ====== @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..725aaa24275b --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,90 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + + ret = security_watch_devices(); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret = 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + kfree(watch); +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* RE: [PATCH 06/11] Add a general, global device notification watch list [ver #7] 2019-08-30 13:58 ` David Howells (?) @ 2019-09-03 8:34 ` Yoshihiro Shimoda -1 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-03 8:34 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Hi, > From: David Howells, Sent: Friday, August 30, 2019 10:58 PM <snip> > --- > > Documentation/watch_queue.rst | 22 ++++++- > arch/alpha/kernel/syscalls/syscall.tbl | 1 > arch/arm/tools/syscall.tbl | 1 > arch/ia64/kernel/syscalls/syscall.tbl | 1 It seems to lack modification for arch/arm64. I'm not sure whether this is related, but my environment (R-Car H3 / r8a7795) cannot boot on next-20190902 which contains this patch. I found an issue on the patch 08/11, so I'll report on the email thread later. > arch/m68k/kernel/syscalls/syscall.tbl | 1 > arch/microblaze/kernel/syscalls/syscall.tbl | 1 > arch/mips/kernel/syscalls/syscall_n32.tbl | 1 > arch/mips/kernel/syscalls/syscall_n64.tbl | 1 > arch/mips/kernel/syscalls/syscall_o32.tbl | 1 > arch/parisc/kernel/syscalls/syscall.tbl | 1 > arch/powerpc/kernel/syscalls/syscall.tbl | 1 > arch/s390/kernel/syscalls/syscall.tbl | 1 > arch/sh/kernel/syscalls/syscall.tbl | 1 > arch/sparc/kernel/syscalls/syscall.tbl | 1 > arch/x86/entry/syscalls/syscall_32.tbl | 1 > arch/x86/entry/syscalls/syscall_64.tbl | 1 > arch/xtensa/kernel/syscalls/syscall.tbl | 1 > drivers/base/Kconfig | 9 +++ > drivers/base/Makefile | 1 > drivers/base/watch.c | 90 +++++++++++++++++++++++++++ > include/linux/device.h | 7 ++ > include/linux/syscalls.h | 1 > include/uapi/asm-generic/unistd.h | 4 + > kernel/sys_ni.c | 1 > 24 files changed, 149 insertions(+), 2 deletions(-) > create mode 100644 drivers/base/watch.c Best regards, Yoshihiro Shimoda ^ permalink raw reply [flat|nested] 234+ messages in thread
* RE: [PATCH 06/11] Add a general, global device notification watch list [ver #7] @ 2019-09-03 8:34 ` Yoshihiro Shimoda 0 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-03 8:34 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block Hi, > From: David Howells, Sent: Friday, August 30, 2019 10:58 PM <snip> > --- > > Documentation/watch_queue.rst | 22 ++++++- > arch/alpha/kernel/syscalls/syscall.tbl | 1 > arch/arm/tools/syscall.tbl | 1 > arch/ia64/kernel/syscalls/syscall.tbl | 1 It seems to lack modification for arch/arm64. I'm not sure whether this is related, but my environment (R-Car H3 / r8a7795) cannot boot on next-20190902 which contains this patch. I found an issue on the patch 08/11, so I'll report on the email thread later. > arch/m68k/kernel/syscalls/syscall.tbl | 1 > arch/microblaze/kernel/syscalls/syscall.tbl | 1 > arch/mips/kernel/syscalls/syscall_n32.tbl | 1 > arch/mips/kernel/syscalls/syscall_n64.tbl | 1 > arch/mips/kernel/syscalls/syscall_o32.tbl | 1 > arch/parisc/kernel/syscalls/syscall.tbl | 1 > arch/powerpc/kernel/syscalls/syscall.tbl | 1 > arch/s390/kernel/syscalls/syscall.tbl | 1 > arch/sh/kernel/syscalls/syscall.tbl | 1 > arch/sparc/kernel/syscalls/syscall.tbl | 1 > arch/x86/entry/syscalls/syscall_32.tbl | 1 > arch/x86/entry/syscalls/syscall_64.tbl | 1 > arch/xtensa/kernel/syscalls/syscall.tbl | 1 > drivers/base/Kconfig | 9 +++ > drivers/base/Makefile | 1 > drivers/base/watch.c | 90 +++++++++++++++++++++++++++ > include/linux/device.h | 7 ++ > include/linux/syscalls.h | 1 > include/uapi/asm-generic/unistd.h | 4 + > kernel/sys_ni.c | 1 > 24 files changed, 149 insertions(+), 2 deletions(-) > create mode 100644 drivers/base/watch.c Best regards, Yoshihiro Shimoda ^ permalink raw reply [flat|nested] 234+ messages in thread
* RE: [PATCH 06/11] Add a general, global device notification watch list [ver #7] @ 2019-09-03 8:34 ` Yoshihiro Shimoda 0 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-03 8:34 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block SGksDQoNCj4gRnJvbTogRGF2aWQgSG93ZWxscywgU2VudDogRnJpZGF5LCBBdWd1c3QgMzAsIDIw MTkgMTA6NTggUE0NCjxzbmlwPg0KPiAtLS0NCj4gDQo+ICBEb2N1bWVudGF0aW9uL3dhdGNoX3F1 ZXVlLnJzdCAgICAgICAgICAgICAgIHwgICAyMiArKysrKystDQo+ICBhcmNoL2FscGhhL2tlcm5l bC9zeXNjYWxscy9zeXNjYWxsLnRibCAgICAgIHwgICAgMQ0KPiAgYXJjaC9hcm0vdG9vbHMvc3lz Y2FsbC50YmwgICAgICAgICAgICAgICAgICB8ICAgIDENCj4gIGFyY2gvaWE2NC9rZXJuZWwvc3lz Y2FsbHMvc3lzY2FsbC50YmwgICAgICAgfCAgICAxDQoNCkl0IHNlZW1zIHRvIGxhY2sgbW9kaWZp Y2F0aW9uIGZvciBhcmNoL2FybTY0Lg0KDQpJJ20gbm90IHN1cmUgd2hldGhlciB0aGlzIGlzIHJl bGF0ZWQsIGJ1dCBteSBlbnZpcm9ubWVudCAoUi1DYXIgSDMgLyByOGE3Nzk1KQ0KY2Fubm90IGJv b3Qgb24gbmV4dC0yMDE5MDkwMiB3aGljaCBjb250YWlucyB0aGlzIHBhdGNoLiBJIGZvdW5kIGFu IGlzc3VlDQpvbiB0aGUgcGF0Y2ggMDgvMTEsIHNvIEknbGwgcmVwb3J0IG9uIHRoZSBlbWFpbCB0 aHJlYWQgbGF0ZXIuDQoNCj4gIGFyY2gvbTY4ay9rZXJuZWwvc3lzY2FsbHMvc3lzY2FsbC50Ymwg ICAgICAgfCAgICAxDQo+ICBhcmNoL21pY3JvYmxhemUva2VybmVsL3N5c2NhbGxzL3N5c2NhbGwu dGJsIHwgICAgMQ0KPiAgYXJjaC9taXBzL2tlcm5lbC9zeXNjYWxscy9zeXNjYWxsX24zMi50Ymwg ICB8ICAgIDENCj4gIGFyY2gvbWlwcy9rZXJuZWwvc3lzY2FsbHMvc3lzY2FsbF9uNjQudGJsICAg fCAgICAxDQo+ICBhcmNoL21pcHMva2VybmVsL3N5c2NhbGxzL3N5c2NhbGxfbzMyLnRibCAgIHwg ICAgMQ0KPiAgYXJjaC9wYXJpc2Mva2VybmVsL3N5c2NhbGxzL3N5c2NhbGwudGJsICAgICB8ICAg IDENCj4gIGFyY2gvcG93ZXJwYy9rZXJuZWwvc3lzY2FsbHMvc3lzY2FsbC50YmwgICAgfCAgICAx DQo+ICBhcmNoL3MzOTAva2VybmVsL3N5c2NhbGxzL3N5c2NhbGwudGJsICAgICAgIHwgICAgMQ0K PiAgYXJjaC9zaC9rZXJuZWwvc3lzY2FsbHMvc3lzY2FsbC50YmwgICAgICAgICB8ICAgIDENCj4g IGFyY2gvc3BhcmMva2VybmVsL3N5c2NhbGxzL3N5c2NhbGwudGJsICAgICAgfCAgICAxDQo+ICBh cmNoL3g4Ni9lbnRyeS9zeXNjYWxscy9zeXNjYWxsXzMyLnRibCAgICAgIHwgICAgMQ0KPiAgYXJj aC94ODYvZW50cnkvc3lzY2FsbHMvc3lzY2FsbF82NC50YmwgICAgICB8ICAgIDENCj4gIGFyY2gv eHRlbnNhL2tlcm5lbC9zeXNjYWxscy9zeXNjYWxsLnRibCAgICAgfCAgICAxDQo+ICBkcml2ZXJz L2Jhc2UvS2NvbmZpZyAgICAgICAgICAgICAgICAgICAgICAgIHwgICAgOSArKysNCj4gIGRyaXZl cnMvYmFzZS9NYWtlZmlsZSAgICAgICAgICAgICAgICAgICAgICAgfCAgICAxDQo+ICBkcml2ZXJz L2Jhc2Uvd2F0Y2guYyAgICAgICAgICAgICAgICAgICAgICAgIHwgICA5MCArKysrKysrKysrKysr KysrKysrKysrKysrKysNCj4gIGluY2x1ZGUvbGludXgvZGV2aWNlLmggICAgICAgICAgICAgICAg ICAgICAgfCAgICA3ICsrDQo+ICBpbmNsdWRlL2xpbnV4L3N5c2NhbGxzLmggICAgICAgICAgICAg ICAgICAgIHwgICAgMQ0KPiAgaW5jbHVkZS91YXBpL2FzbS1nZW5lcmljL3VuaXN0ZC5oICAgICAg ICAgICB8ICAgIDQgKw0KPiAga2VybmVsL3N5c19uaS5jICAgICAgICAgICAgICAgICAgICAgICAg ICAgICB8ICAgIDENCj4gIDI0IGZpbGVzIGNoYW5nZWQsIDE0OSBpbnNlcnRpb25zKCspLCAyIGRl bGV0aW9ucygtKQ0KPiAgY3JlYXRlIG1vZGUgMTAwNjQ0IGRyaXZlcnMvYmFzZS93YXRjaC5jDQoN CkJlc3QgcmVnYXJkcywNCllvc2hpaGlybyBTaGltb2RhDQoNCg= ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 06/11] Add a general, global device notification watch list [ver #7] 2019-08-30 13:58 ` David Howells (?) @ 2019-09-03 16:41 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:41 UTC (permalink / raw) To: Yoshihiro Shimoda Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> wrote: > It seems to lack modification for arch/arm64. Fixed. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 06/11] Add a general, global device notification watch list [ver #7] @ 2019-09-03 16:41 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:41 UTC (permalink / raw) To: Yoshihiro Shimoda Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel@vger.kernel.org Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> wrote: > It seems to lack modification for arch/arm64. Fixed. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 06/11] Add a general, global device notification watch list [ver #7] @ 2019-09-03 16:41 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:41 UTC (permalink / raw) To: Yoshihiro Shimoda Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel@vger.kernel.org Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> wrote: > It seems to lack modification for arch/arm64. Fixed. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:58 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:58 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 ++++++ drivers/usb/core/Kconfig | 9 ++++++ drivers/usb/core/devio.c | 56 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++ include/uapi/linux/watch_queue.h | 30 ++++++++++++++++++++ 6 files changed, 125 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =============== diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..b8572e4d6a1b 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..bc5183e10d8c 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,31 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_BUS_ADD = 2, /* USB bus added */ + NOTIFY_USB_BUS_REMOVE = 3, /* USB bus removed */ + NOTIFY_USB_DEVICE_RESET = 4, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 5, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 ++++++ drivers/usb/core/Kconfig | 9 ++++++ drivers/usb/core/devio.c | 56 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++ include/uapi/linux/watch_queue.h | 30 ++++++++++++++++++++ 6 files changed, 125 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =============== diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..b8572e4d6a1b 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..bc5183e10d8c 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,31 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_BUS_ADD = 2, /* USB bus added */ + NOTIFY_USB_BUS_REMOVE = 3, /* USB bus removed */ + NOTIFY_USB_DEVICE_RESET = 4, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 5, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 ++++++ drivers/usb/core/Kconfig | 9 ++++++ drivers/usb/core/devio.c | 56 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++ include/uapi/linux/watch_queue.h | 30 ++++++++++++++++++++ 6 files changed, 125 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =======diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..b8572e4d6a1b 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..bc5183e10d8c 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,31 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_BUS_ADD = 2, /* USB bus added */ + NOTIFY_USB_BUS_REMOVE = 3, /* USB bus removed */ + NOTIFY_USB_DEVICE_RESET = 4, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 5, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* RE: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-08-30 13:58 ` David Howells (?) @ 2019-09-03 8:53 ` Yoshihiro Shimoda -1 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-03 8:53 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Hi, > From: David Howells, Sent: Friday, August 30, 2019 10:58 PM <snip> > diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > index 9063ede411ae..b8572e4d6a1b 100644 > --- a/drivers/usb/core/devio.c > +++ b/drivers/usb/core/devio.c > @@ -41,6 +41,7 @@ > #include <linux/dma-mapping.h> > #include <asm/byteorder.h> > #include <linux/moduleparam.h> > +#include <linux/watch_queue.h> > > #include "usb.h" > > @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) > } > } > > +#ifdef CONFIG_USB_NOTIFICATIONS > +static noinline void post_usb_notification(const char *devname, > + enum usb_notification_type subtype, > + u32 error) > +{ > + unsigned int gran = WATCH_LENGTH_GRANULARITY; > + unsigned int name_len, n_len; > + u64 id = 0; /* Might want to put a dev# here. */ > + > + struct { > + struct usb_notification n; > + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - > + (sizeof(struct usb_notification) - > + offsetof(struct usb_notification, name))]; > + } n; > + > + name_len = strlen(devname); > + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); > + n_len = round_up(offsetof(struct usb_notification, name) + name_len, > + gran) / gran; > + > + memset(&n, 0, sizeof(n)); > + memcpy(n.n.name, devname, n_len); > + > + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; > + n.n.watch.subtype = subtype; > + n.n.watch.info = n_len; > + n.n.error = error; > + n.n.name_len = name_len; > + > + post_device_notification(&n.n.watch, id); > +} > + > +void post_usb_device_notification(const struct usb_device *udev, > + enum usb_notification_type subtype, u32 error) > +{ > + post_usb_notification(dev_name(&udev->dev), subtype, error); > +} > + > +void post_usb_bus_notification(const struct usb_bus *ubus, This function's argument is struct usb_bus *, but ... > + enum usb_notification_type subtype, u32 error) > +{ > + post_usb_notification(ubus->bus_name, subtype, error); > +} > +#endif > + > static int usbdev_notify(struct notifier_block *self, > unsigned long action, void *dev) > { > switch (action) { > case USB_DEVICE_ADD: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > break; > case USB_DEVICE_REMOVE: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > + usbdev_remove(dev); > + break; > + case USB_BUS_ADD: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > + break; > + case USB_BUS_REMOVE: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > usbdev_remove(dev); this function calls usbdev_remove() with incorrect argument if the action is USB_BUS_REMOVE. So, this seems to cause the following issue [1] on my environment (R-Car H3 / r8a7795 on next-20190902) [2]. However, I have no idea how to fix the issue, so I report this issue at the first step. JFYI, even if I have reverted this patch on next-20190902, other issue appears [3]. [1] The following panic happened. [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd073] [ 0.000000] Linux version 5.3.0-rc6-next-20190902 (shimoda@shimoda-RB02198) (gcc version 7.4.1 20181213 [linaro-7.4-2019.02 revision 56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4] (Linaro GCC 7.4-2019.02)) #47 SMP PREEMPT Tue Sep 3 17:42:01 JST 2019 [ 0.000000] Machine model: Renesas Salvator-X board based on r8a7795 ES2.0+ [ 0.000000] printk: debug: ignoring loglevel setting. [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 32 MiB at 0x00000000be000000 [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000048000000-0x000000077fffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x77efdb800-0x77efdcfff] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000048000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x000000077fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000048000000-0x00000000bfffffff] [ 0.000000] node 0: [mem 0x0000000500000000-0x000000057fffffff] [ 0.000000] node 0: [mem 0x0000000600000000-0x000000067fffffff] [ 0.000000] node 0: [mem 0x0000000700000000-0x000000077fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000048000000-0x000000077fffffff] [ 0.000000] On node 0 totalpages: 2064384 [ 0.000000] DMA32 zone: 7680 pages used for memmap [ 0.000000] DMA32 zone: 0 pages reserved [ 0.000000] DMA32 zone: 491520 pages, LIFO batch:63 [ 0.000000] Normal zone: 24576 pages used for memmap [ 0.000000] Normal zone: 1572864 pages, LIFO batch:63 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.1 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] psci: SMC Calling Convention v1.1 [ 0.000000] percpu: Embedded 22 pages/cpu s52952 r8192 d28968 u90112 [ 0.000000] pcpu-alloc: s52952 r8192 d28968 u90112 alloc=22*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] CPU features: detected: EL2 vector hardening [ 0.000000] Speculative Store Bypass Disable mitigation not required [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 2032128 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: console=ttySC0,115200 ignore_loglevel consoleblank=0 rw root=/dev/nfs ip=dhcp [ 0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: mapped [mem 0xba000000-0xbe000000] (64MB) [ 0.000000] Memory: 7972368K/8257536K available (12092K kernel code, 1846K rwdata, 6320K rodata, 4992K init, 450K bss, 252400K reserved, 32768K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8. [ 0.000000] Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f102f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] random: get_random_bytes called from start_kernel+0x2f0/0x490 with crng_init=0 [ 0.000000] arch_timer: cp15 timer(s) running at 8.33MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1ec02923e, max_idle_ns: 440795202125 ns [ 0.000003] sched_clock: 56 bits at 8MHz, resolution 120ns, wraps every 2199023255496ns [ 0.000142] Console: colour dummy device 80x25 [ 0.000211] Calibrating delay loop (skipped), value calculated using timer frequency.. 16.66 BogoMIPS (lpj=33333) [ 0.000218] pid_max: default: 32768 minimum: 301 [ 0.000273] LSM: Security Framework initializing [ 0.000351] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.000397] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.023974] ASID allocator initialised with 32768 entries [ 0.031963] rcu: Hierarchical SRCU implementation. [ 0.041031] Detected Renesas R-Car Gen3 r8a7795 ES3.0 [ 0.042354] EFI services will not be available. [ 0.047989] smp: Bringing up secondary CPUs ... [ 0.080173] Detected PIPT I-cache on CPU1 [ 0.080213] CPU1: Booted secondary processor 0x0000000001 [0x411fd073] [ 0.112190] Detected PIPT I-cache on CPU2 [ 0.112210] CPU2: Booted secondary processor 0x0000000002 [0x411fd073] [ 0.144225] Detected PIPT I-cache on CPU3 [ 0.144244] CPU3: Booted secondary processor 0x0000000003 [0x411fd073] [ 0.176267] CPU features: detected: ARM erratum 845719 [ 0.176278] Detected VIPT I-cache on CPU4 [ 0.176316] CPU4: Booted secondary processor 0x0000000100 [0x410fd034] [ 0.208292] Detected VIPT I-cache on CPU5 [ 0.208316] CPU5: Booted secondary processor 0x0000000101 [0x410fd034] [ 0.240331] Detected VIPT I-cache on CPU6 [ 0.240354] CPU6: Booted secondary processor 0x0000000102 [0x410fd034] [ 0.272365] Detected VIPT I-cache on CPU7 [ 0.272389] CPU7: Booted secondary processor 0x0000000103 [0x410fd034] [ 0.272464] smp: Brought up 1 node, 8 CPUs [ 0.272484] SMP: Total of 8 processors activated. [ 0.272488] CPU features: detected: 32-bit EL0 Support [ 0.272493] CPU features: detected: CRC32 instructions [ 0.282612] CPU: All CPU(s) started at EL2 [ 0.282644] alternatives: patching kernel code [ 0.283676] devtmpfs: initialized [ 0.289458] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.289471] futex hash table entries: 2048 (order: 5, 131072 bytes, linear) [ 0.290163] pinctrl core: initialized pinctrl subsystem [ 0.291360] DMI not present or invalid. [ 0.291607] NET: Registered protocol family 16 [ 0.292388] DMA: preallocated 256 KiB pool for atomic allocations [ 0.292399] audit: initializing netlink subsys (disabled) [ 0.292539] audit: type=2000 audit(0.292:1): state=initialized audit_enabled=0 res=1 [ 0.293573] cpuidle: using governor menu [ 0.293733] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.294678] Serial: AMBA PL011 UART driver [ 0.296912] sh-pfc e6060000.pin-controller: IRQ index 0 not found [ 0.297125] sh-pfc e6060000.pin-controller: r8a77951_pfc support registered [ 0.317432] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 0.317439] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages [ 0.317443] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.317447] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages [ 0.319199] cryptd: max_cpu_qlen set to 1000 [ 0.322091] ACPI: Interpreter disabled. [ 0.325627] iommu: Default domain type: Translated [ 0.325823] vgaarb: loaded [ 0.326011] SCSI subsystem initialized [ 0.326113] libata version 3.00 loaded. [ 0.326243] usbcore: registered new interface driver usbfs [ 0.326264] usbcore: registered new interface driver hub [ 0.326307] usbcore: registered new device driver usb [ 0.327255] i2c-sh_mobile e60b0000.i2c: I2C adapter 7, bus speed 400000 Hz [ 0.327560] pps_core: LinuxPPS API ver. 1 registered [ 0.327564] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> [ 0.327573] PTP clock support registered [ 0.327701] EDAC MC: Ver: 3.0.0 [ 0.328991] FPGA manager framework [ 0.329031] Advanced Linux Sound Architecture Driver Initialized. [ 0.329497] clocksource: Switched to clocksource arch_sys_counter [ 0.329639] VFS: Disk quotas dquot_6.6.0 [ 0.329682] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.329800] pnp: PnP ACPI: disabled [ 0.332764] thermal_sys: Registered thermal governor 'step_wise' [ 0.332767] thermal_sys: Registered thermal governor 'power_allocator' [ 0.333270] NET: Registered protocol family 2 [ 0.333558] tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear) [ 0.333624] TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear) [ 0.333903] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 0.334489] TCP: Hash tables configured (established 65536 bind 65536) [ 0.334606] UDP hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.334714] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.334929] NET: Registered protocol family 1 [ 0.335290] RPC: Registered named UNIX socket transport module. [ 0.335295] RPC: Registered udp transport module. [ 0.335299] RPC: Registered tcp transport module. [ 0.335302] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.335311] PCI: CLS 0 bytes, default 64 [ 0.336141] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available [ 0.336377] hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available [ 0.336799] kvm [1]: IPA Size Limit: 40bits [ 0.337273] kvm [1]: vgic interrupt IRQ1 [ 0.337415] kvm [1]: Hyp mode initialized successfully [ 0.341775] Initialise system trusted keyrings [ 0.341864] workingset: timestamp_bits=44 max_order=21 bucket_order=0 [ 0.345076] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 0.345515] NFS: Registering the id_resolver key type [ 0.345532] Key type id_resolver registered [ 0.345535] Key type id_legacy registered [ 0.345544] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 0.345638] 9p: Installing v9fs 9p2000 file system support [ 0.354995] Key type asymmetric registered [ 0.355001] Asymmetric key parser 'x509' registered [ 0.355027] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245) [ 0.355032] io scheduler mq-deadline registered [ 0.355036] io scheduler kyber registered [ 0.359639] phy_rcar_gen3_usb2 ee0a0200.usb-phy: IRQ index 0 not found [ 0.360346] phy_rcar_gen3_usb2 ee0c0200.usb-phy: IRQ index 0 not found [ 0.366010] gpio_rcar e6050000.gpio: driving 16 GPIOs [ 0.366187] gpio_rcar e6051000.gpio: driving 29 GPIOs [ 0.366348] gpio_rcar e6052000.gpio: driving 15 GPIOs [ 0.366504] gpio_rcar e6053000.gpio: driving 16 GPIOs [ 0.366663] gpio_rcar e6054000.gpio: driving 18 GPIOs [ 0.366816] gpio_rcar e6055000.gpio: driving 26 GPIOs [ 0.366973] gpio_rcar e6055400.gpio: driving 32 GPIOs [ 0.367126] gpio_rcar e6055800.gpio: driving 4 GPIOs [ 0.368571] rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges: [ 0.368596] rcar-pcie fe000000.pcie: IO 0xfe100000..0xfe1fffff -> 0x00000000 [ 0.368613] rcar-pcie fe000000.pcie: MEM 0xfe200000..0xfe3fffff -> 0xfe200000 [ 0.368626] rcar-pcie fe000000.pcie: MEM 0x30000000..0x37ffffff -> 0x30000000 [ 0.368635] rcar-pcie fe000000.pcie: MEM 0x38000000..0x3fffffff -> 0x38000000 [ 0.433003] rcar-pcie fe000000.pcie: PCIe link down [ 0.433148] rcar-pcie ee800000.pcie: host bridge /soc/pcie@ee800000 ranges: [ 0.433165] rcar-pcie ee800000.pcie: IO 0xee900000..0xee9fffff -> 0x00000000 [ 0.433179] rcar-pcie ee800000.pcie: MEM 0xeea00000..0xeebfffff -> 0xeea00000 [ 0.433191] rcar-pcie ee800000.pcie: MEM 0xc0000000..0xc7ffffff -> 0xc0000000 [ 0.433200] rcar-pcie ee800000.pcie: MEM 0xc8000000..0xcfffffff -> 0xc8000000 [ 0.496985] rcar-pcie ee800000.pcie: PCIe link down [ 0.498893] EINJ: ACPI disabled. [ 0.510430] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.512246] SuperH (H)SCI(F) driver initialized [ 0.512568] sh-sci e6550000.serial: IRQ index 1 not found [ 0.512577] sh-sci e6550000.serial: IRQ index 2 not found [ 0.512584] sh-sci e6550000.serial: IRQ index 3 not found [ 0.512591] sh-sci e6550000.serial: IRQ index 4 not found [ 0.512597] sh-sci e6550000.serial: IRQ index 5 not found [ 0.512647] e6550000.serial: ttySC1 at MMIO 0xe6550000 (irq = 34, base_baud = 0) is a hscif [ 0.513065] sh-sci e6e88000.serial: IRQ index 1 not found [ 0.513073] sh-sci e6e88000.serial: IRQ index 2 not found [ 0.513079] sh-sci e6e88000.serial: IRQ index 3 not found [ 0.513086] sh-sci e6e88000.serial: IRQ index 4 not found [ 0.513092] sh-sci e6e88000.serial: IRQ index 5 not found [ 0.513119] e6e88000.serial: ttySC0 at MMIO 0xe6e88000 (irq = 119, base_baud = 0) is a scif [ 1.655695] printk: console [ttySC0] enabled [ 1.660706] msm_serial: driver initialized [ 1.671544] loop: module loaded [ 1.679482] libphy: Fixed MDIO Bus: probed [ 1.683719] tun: Universal TUN/TAP device driver, 1.6 [ 1.689559] thunder_xcv, ver 1.0 [ 1.692805] thunder_bgx, ver 1.0 [ 1.696052] nicpf, ver 1.0 [ 1.699373] hclge is initializing [ 1.702688] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version [ 1.709907] hns3: Copyright (c) 2017 Huawei Corporation. [ 1.715242] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 1.721073] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 1.727012] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k [ 1.733971] igb: Copyright (c) 2007-2014 Intel Corporation. [ 1.739557] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.4.0-k [ 1.747383] igbvf: Copyright (c) 2009 - 2012 Intel Corporation. [ 1.753636] sky2: driver version 1.30 [ 1.758264] VFIO - User Level meta-driver version: 0.3 [ 1.764783] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 1.771320] ehci-pci: EHCI PCI platform driver [ 1.775780] ehci-platform: EHCI generic platform driver [ 1.781335] ehci-platform ee0a0100.usb: EHCI Host Controller [ 1.787016] ehci-platform ee0a0100.usb: new USB bus registered, assigned bus number 1 [ 1.794935] ehci-platform ee0a0100.usb: irq 165, io mem 0xee0a0100 [ 1.813507] ehci-platform ee0a0100.usb: USB 2.0 started, EHCI 1.10 [ 1.820044] hub 1-0:1.0: USB hub found [ 1.823828] hub 1-0:1.0: 1 port detected [ 1.828017] ehci-platform ee0c0100.usb: EHCI Host Controller [ 1.833684] ehci-platform ee0c0100.usb: new USB bus registered, assigned bus number 2 [ 1.841560] ehci-platform ee0c0100.usb: irq 166, io mem 0xee0c0100 [ 1.861506] ehci-platform ee0c0100.usb: USB 2.0 started, EHCI 1.10 [ 1.867940] hub 2-0:1.0: USB hub found [ 1.871704] hub 2-0:1.0: 1 port detected [ 1.875860] ehci-orion: EHCI orion driver [ 1.880049] ehci-exynos: EHCI EXYNOS driver [ 1.884320] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 1.890509] ohci-pci: OHCI PCI platform driver [ 1.894978] ohci-platform: OHCI generic platform driver [ 1.900444] ohci-platform ee0a0000.usb: Generic Platform OHCI controller [ 1.907159] ohci-platform ee0a0000.usb: new USB bus registered, assigned bus number 3 [ 1.915025] ohci-platform ee0a0000.usb: irq 165, io mem 0xee0a0000 [ 2.008477] hub 3-0:1.0: USB hub found [ 2.012244] hub 3-0:1.0: 1 port detected [ 2.016388] ohci-platform ee0c0000.usb: Generic Platform OHCI controller [ 2.023097] ohci-platform ee0c0000.usb: new USB bus registered, assigned bus number 4 [ 2.030977] ohci-platform ee0c0000.usb: irq 166, io mem 0xee0c0000 [ 2.124457] hub 4-0:1.0: USB hub found [ 2.128220] hub 4-0:1.0: 1 port detected [ 2.132361] ohci-exynos: OHCI EXYNOS driver [ 2.137069] xhci-hcd ee000000.usb: xHCI Host Controller [ 2.142305] xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 5 [ 2.149748] xhci-hcd ee000000.usb: Direct firmware load for r8a779x_usb3_v3.dlmem failed with error -2 [ 2.159063] xhci-hcd ee000000.usb: can't setup: -2 [ 2.163861] xhci-hcd ee000000.usb: USB bus 5 deregistered [ 2.169266] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [ 2.178042] Mem abort info: [ 2.180828] ESR = 0x96000004 [ 2.183876] EC = 0x25: DABT (current EL), IL = 32 bits [ 2.189179] SET = 0, FnV = 0 [ 2.192226] EA = 0, S1PTW = 0 [ 2.195358] Data abort info: [ 2.198231] ISV = 0, ISS = 0x00000004 [ 2.202058] CM = 0, WnR = 0 [ 2.205019] [0000000000000020] user address but active_mm is swapper [ 2.211366] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2.216930] Modules linked in: [ 2.219981] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc6-next-20190902 #47 [ 2.227456] Hardware name: Renesas Salvator-X board based on r8a7795 ES2.0+ (DT) [ 2.234844] pstate: a0000085 (NzCv daIf -PAN -UAO) [ 2.239638] pc : _raw_write_lock+0x68/0x288 [ 2.243819] lr : destroy_async+0x20/0xb0 [ 2.247733] sp : ffff80001006b9b0 [ 2.251040] x29: ffff80001006b9b0 x28: ffff800011186fd8 [ 2.256345] x27: ffff800011186fc0 x26: 00000000ffffffed [ 2.261650] x25: ffff8000118ff570 x24: ffff8000118ff000 [ 2.266955] x23: 0000000000000004 x22: ffff800011900000 [ 2.272259] x21: 0000000000000020 x20: 0000000000000028 [ 2.277564] x19: 0000000000000000 x18: 0000000000000005 [ 2.282868] x17: 0000000000000020 x16: ffff800010d164b0 [ 2.288172] x15: ffff8000118ff6e8 x14: ffff000735867958 [ 2.293476] x13: 0000000000000000 x12: ffff8000118ff6e8 [ 2.298779] x11: ffff000735867908 x10: 0000000000000040 [ 2.304083] x9 : ffff8000118ff6f0 x8 : ffff8000118ff6e8 [ 2.309388] x7 : ffff000735867958 x6 : 0000000000000000 [ 2.314691] x5 : 0000000000000001 x4 : 0000000000000000 [ 2.319995] x3 : 0000000000000020 x2 : 0000000000000001 [ 2.325299] x1 : 0000000000000000 x0 : 0000000000000001 [ 2.330604] Call trace: [ 2.333045] _raw_write_lock+0x68/0x288 [ 2.336874] destroy_async+0x20/0xb0 [ 2.340443] usbdev_remove+0x3c/0xc0 [ 2.344011] usbdev_notify+0x20/0x38 [ 2.347583] notifier_call_chain+0x54/0x98 [ 2.351672] blocking_notifier_call_chain+0x48/0x70 [ 2.356543] usb_notify_remove_bus+0x1c/0x28 [ 2.360808] usb_deregister_bus+0x58/0x68 [ 2.364811] usb_add_hcd+0x234/0x730 [ 2.368381] xhci_plat_probe+0x4ec/0x650 [ 2.372302] platform_drv_probe+0x50/0xa0 [ 2.376305] really_probe+0xdc/0x350 [ 2.379874] driver_probe_device+0x58/0x100 [ 2.384050] device_driver_attach+0x6c/0x90 [ 2.388226] __driver_attach+0x84/0xc8 [ 2.391968] bus_for_each_dev+0x74/0xc8 [ 2.395796] driver_attach+0x20/0x28 [ 2.399365] bus_add_driver+0x148/0x1f0 [ 2.403193] driver_register+0x60/0x110 [ 2.407022] __platform_driver_register+0x40/0x48 [ 2.411723] xhci_plat_init+0x2c/0x34 [ 2.415380] do_one_initcall+0x5c/0x1b0 [ 2.419213] kernel_init_freeable+0x1a4/0x24c [ 2.423564] kernel_init+0x10/0x108 [ 2.427045] ret_from_fork+0x10/0x18 [ 2.430617] Code: 97d3f2b6 a8c17bfd d65f03c0 f9800071 (885ffc60) [ 2.436717] ---[ end trace 33e4fb349eb48047 ]--- [ 2.441345] note: swapper/0[1] exited with preempt_count 1 [ 2.446846] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 2.454497] SMP: stopping secondary CPUs [ 2.458416] Kernel Offset: disabled [ 2.461898] CPU features: 0x0002,21006004 [ 2.465899] Memory Limit: none [ 2.468950] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- [2] I'm using defconfig on arch/arm64 and disable CONFIG_FW_LOADER_USER_HELPER. [3] The following panic happened when I reverted the commit ef9cc255c9539288f119156412d23a4b785f3599 on next-20190902. [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd073] [ 0.000000] Linux version 5.3.0-rc6-next-20190902-00001-g9709468 (shimoda@shimoda-RB02198) (gcc version 7.4.1 20181213 [linaro-7.4-2019.02 revision 56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4] (Linaro GCC 7.4-2019.02)) #48 SMP PREEMPT Tue Sep 3 17:46:54 JST 2019 [ 0.000000] Machine model: Renesas Salvator-X board based on r8a7795 ES2.0+ [ 0.000000] printk: debug: ignoring loglevel setting. [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 32 MiB at 0x00000000be000000 [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000048000000-0x000000077fffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x77efdb800-0x77efdcfff] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000048000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x000000077fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000048000000-0x00000000bfffffff] [ 0.000000] node 0: [mem 0x0000000500000000-0x000000057fffffff] [ 0.000000] node 0: [mem 0x0000000600000000-0x000000067fffffff] [ 0.000000] node 0: [mem 0x0000000700000000-0x000000077fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000048000000-0x000000077fffffff] [ 0.000000] On node 0 totalpages: 2064384 [ 0.000000] DMA32 zone: 7680 pages used for memmap [ 0.000000] DMA32 zone: 0 pages reserved [ 0.000000] DMA32 zone: 491520 pages, LIFO batch:63 [ 0.000000] Normal zone: 24576 pages used for memmap [ 0.000000] Normal zone: 1572864 pages, LIFO batch:63 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.1 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] psci: SMC Calling Convention v1.1 [ 0.000000] percpu: Embedded 22 pages/cpu s52952 r8192 d28968 u90112 [ 0.000000] pcpu-alloc: s52952 r8192 d28968 u90112 alloc=22*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] CPU features: detected: EL2 vector hardening [ 0.000000] Speculative Store Bypass Disable mitigation not required [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 2032128 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: console=ttySC0,115200 ignore_loglevel consoleblank=0 rw root=/dev/nfs ip=dhcp [ 0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: mapped [mem 0xba000000-0xbe000000] (64MB) [ 0.000000] Memory: 7972368K/8257536K available (12092K kernel code, 1846K rwdata, 6320K rodata, 4992K init, 450K bss, 252400K reserved, 32768K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8. [ 0.000000] Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f102f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] random: get_random_bytes called from start_kernel+0x2f0/0x490 with crng_init=0 [ 0.000000] arch_timer: cp15 timer(s) running at 8.33MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1ec02923e, max_idle_ns: 440795202125 ns [ 0.000002] sched_clock: 56 bits at 8MHz, resolution 120ns, wraps every 2199023255496ns [ 0.000141] Console: colour dummy device 80x25 [ 0.000207] Calibrating delay loop (skipped), value calculated using timer frequency.. 16.66 BogoMIPS (lpj=33333) [ 0.000215] pid_max: default: 32768 minimum: 301 [ 0.000270] LSM: Security Framework initializing [ 0.000352] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.000398] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.023978] ASID allocator initialised with 32768 entries [ 0.031968] rcu: Hierarchical SRCU implementation. [ 0.041040] Detected Renesas R-Car Gen3 r8a7795 ES3.0 [ 0.042364] EFI services will not be available. [ 0.047996] smp: Bringing up secondary CPUs ... [ 0.080183] Detected PIPT I-cache on CPU1 [ 0.080222] CPU1: Booted secondary processor 0x0000000001 [0x411fd073] [ 0.112195] Detected PIPT I-cache on CPU2 [ 0.112216] CPU2: Booted secondary processor 0x0000000002 [0x411fd073] [ 0.144232] Detected PIPT I-cache on CPU3 [ 0.144253] CPU3: Booted secondary processor 0x0000000003 [0x411fd073] [ 0.176276] CPU features: detected: ARM erratum 845719 [ 0.176286] Detected VIPT I-cache on CPU4 [ 0.176324] CPU4: Booted secondary processor 0x0000000100 [0x410fd034] [ 0.208297] Detected VIPT I-cache on CPU5 [ 0.208321] CPU5: Booted secondary processor 0x0000000101 [0x410fd034] [ 0.240338] Detected VIPT I-cache on CPU6 [ 0.240361] CPU6: Booted secondary processor 0x0000000102 [0x410fd034] [ 0.272375] Detected VIPT I-cache on CPU7 [ 0.272398] CPU7: Booted secondary processor 0x0000000103 [0x410fd034] [ 0.272473] smp: Brought up 1 node, 8 CPUs [ 0.272492] SMP: Total of 8 processors activated. [ 0.272497] CPU features: detected: 32-bit EL0 Support [ 0.272502] CPU features: detected: CRC32 instructions [ 0.282749] CPU: All CPU(s) started at EL2 [ 0.282777] alternatives: patching kernel code [ 0.283815] devtmpfs: initialized [ 0.289644] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.289659] futex hash table entries: 2048 (order: 5, 131072 bytes, linear) [ 0.290353] pinctrl core: initialized pinctrl subsystem [ 0.291538] DMI not present or invalid. [ 0.291777] NET: Registered protocol family 16 [ 0.292561] DMA: preallocated 256 KiB pool for atomic allocations [ 0.292571] audit: initializing netlink subsys (disabled) [ 0.292709] audit: type=2000 audit(0.292:1): state=initialized audit_enabled=0 res=1 [ 0.293743] cpuidle: using governor menu [ 0.293898] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.294862] Serial: AMBA PL011 UART driver [ 0.297061] sh-pfc e6060000.pin-controller: IRQ index 0 not found [ 0.297280] sh-pfc e6060000.pin-controller: r8a77951_pfc support registered [ 0.317490] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 0.317498] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages [ 0.317503] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.317506] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages [ 0.319278] cryptd: max_cpu_qlen set to 1000 [ 0.322162] ACPI: Interpreter disabled. [ 0.325707] iommu: Default domain type: Translated [ 0.325909] vgaarb: loaded [ 0.326100] SCSI subsystem initialized [ 0.326199] libata version 3.00 loaded. [ 0.326333] usbcore: registered new interface driver usbfs [ 0.326353] usbcore: registered new interface driver hub [ 0.326395] usbcore: registered new device driver usb [ 0.327336] i2c-sh_mobile e60b0000.i2c: I2C adapter 7, bus speed 400000 Hz [ 0.327631] pps_core: LinuxPPS API ver. 1 registered [ 0.327636] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> [ 0.327644] PTP clock support registered [ 0.327770] EDAC MC: Ver: 3.0.0 [ 0.329047] FPGA manager framework [ 0.329089] Advanced Linux Sound Architecture Driver Initialized. [ 0.329548] clocksource: Switched to clocksource arch_sys_counter [ 0.329696] VFS: Disk quotas dquot_6.6.0 [ 0.329738] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.329855] pnp: PnP ACPI: disabled [ 0.332843] thermal_sys: Registered thermal governor 'step_wise' [ 0.332845] thermal_sys: Registered thermal governor 'power_allocator' [ 0.333337] NET: Registered protocol family 2 [ 0.333618] tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear) [ 0.333682] TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear) [ 0.333961] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 0.334551] TCP: Hash tables configured (established 65536 bind 65536) [ 0.334661] UDP hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.334769] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.334979] NET: Registered protocol family 1 [ 0.335329] RPC: Registered named UNIX socket transport module. [ 0.335333] RPC: Registered udp transport module. [ 0.335337] RPC: Registered tcp transport module. [ 0.335340] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.335349] PCI: CLS 0 bytes, default 64 [ 0.336179] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available [ 0.336413] hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available [ 0.336837] kvm [1]: IPA Size Limit: 40bits [ 0.337313] kvm [1]: vgic interrupt IRQ1 [ 0.337458] kvm [1]: Hyp mode initialized successfully [ 0.341834] Initialise system trusted keyrings [ 0.341924] workingset: timestamp_bits=44 max_order=21 bucket_order=0 [ 0.345148] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 0.345585] NFS: Registering the id_resolver key type [ 0.345602] Key type id_resolver registered [ 0.345606] Key type id_legacy registered [ 0.345615] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 0.345714] 9p: Installing v9fs 9p2000 file system support [ 0.354814] Key type asymmetric registered [ 0.354819] Asymmetric key parser 'x509' registered [ 0.354844] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245) [ 0.354849] io scheduler mq-deadline registered [ 0.354853] io scheduler kyber registered [ 0.359476] phy_rcar_gen3_usb2 ee0a0200.usb-phy: IRQ index 0 not found [ 0.360174] phy_rcar_gen3_usb2 ee0c0200.usb-phy: IRQ index 0 not found [ 0.365881] gpio_rcar e6050000.gpio: driving 16 GPIOs [ 0.366057] gpio_rcar e6051000.gpio: driving 29 GPIOs [ 0.366222] gpio_rcar e6052000.gpio: driving 15 GPIOs [ 0.366376] gpio_rcar e6053000.gpio: driving 16 GPIOs [ 0.366536] gpio_rcar e6054000.gpio: driving 18 GPIOs [ 0.366688] gpio_rcar e6055000.gpio: driving 26 GPIOs [ 0.366860] gpio_rcar e6055400.gpio: driving 32 GPIOs [ 0.367015] gpio_rcar e6055800.gpio: driving 4 GPIOs [ 0.368473] rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges: [ 0.368498] rcar-pcie fe000000.pcie: IO 0xfe100000..0xfe1fffff -> 0x00000000 [ 0.368514] rcar-pcie fe000000.pcie: MEM 0xfe200000..0xfe3fffff -> 0xfe200000 [ 0.368527] rcar-pcie fe000000.pcie: MEM 0x30000000..0x37ffffff -> 0x30000000 [ 0.368536] rcar-pcie fe000000.pcie: MEM 0x38000000..0x3fffffff -> 0x38000000 [ 0.437037] rcar-pcie fe000000.pcie: PCIe link down [ 0.437187] rcar-pcie ee800000.pcie: host bridge /soc/pcie@ee800000 ranges: [ 0.437205] rcar-pcie ee800000.pcie: IO 0xee900000..0xee9fffff -> 0x00000000 [ 0.437218] rcar-pcie ee800000.pcie: MEM 0xeea00000..0xeebfffff -> 0xeea00000 [ 0.437230] rcar-pcie ee800000.pcie: MEM 0xc0000000..0xc7ffffff -> 0xc0000000 [ 0.437239] rcar-pcie ee800000.pcie: MEM 0xc8000000..0xcfffffff -> 0xc8000000 [ 0.501036] rcar-pcie ee800000.pcie: PCIe link down [ 0.502959] EINJ: ACPI disabled. [ 0.514458] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.516285] SuperH (H)SCI(F) driver initialized [ 0.516608] sh-sci e6550000.serial: IRQ index 1 not found [ 0.516616] sh-sci e6550000.serial: IRQ index 2 not found [ 0.516624] sh-sci e6550000.serial: IRQ index 3 not found [ 0.516630] sh-sci e6550000.serial: IRQ index 4 not found [ 0.516637] sh-sci e6550000.serial: IRQ index 5 not found [ 0.516685] e6550000.serial: ttySC1 at MMIO 0xe6550000 (irq = 34, base_baud = 0) is a hscif [ 0.517112] sh-sci e6e88000.serial: IRQ index 1 not found [ 0.517121] sh-sci e6e88000.serial: IRQ index 2 not found [ 0.517128] sh-sci e6e88000.serial: IRQ index 3 not found [ 0.517134] sh-sci e6e88000.serial: IRQ index 4 not found [ 0.517140] sh-sci e6e88000.serial: IRQ index 5 not found [ 0.517169] e6e88000.serial: ttySC0 at MMIO 0xe6e88000 (irq = 119, base_baud = 0) is a scif [ 1.661047] printk: console [ttySC0] enabled [ 1.666084] msm_serial: driver initialized [ 1.676874] loop: module loaded [ 1.684780] libphy: Fixed MDIO Bus: probed [ 1.689023] tun: Universal TUN/TAP device driver, 1.6 [ 1.694842] thunder_xcv, ver 1.0 [ 1.698099] thunder_bgx, ver 1.0 [ 1.701336] nicpf, ver 1.0 [ 1.704657] hclge is initializing [ 1.707971] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version [ 1.715189] hns3: Copyright (c) 2017 Huawei Corporation. [ 1.720525] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 1.726355] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 1.732292] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k [ 1.739250] igb: Copyright (c) 2007-2014 Intel Corporation. [ 1.744836] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.4.0-k [ 1.752662] igbvf: Copyright (c) 2009 - 2012 Intel Corporation. [ 1.758910] sky2: driver version 1.30 [ 1.763530] VFIO - User Level meta-driver version: 0.3 [ 1.770067] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 1.776596] ehci-pci: EHCI PCI platform driver [ 1.781050] ehci-platform: EHCI generic platform driver [ 1.786609] ehci-platform ee0a0100.usb: EHCI Host Controller [ 1.792287] ehci-platform ee0a0100.usb: new USB bus registered, assigned bus number 1 [ 1.800200] ehci-platform ee0a0100.usb: irq 165, io mem 0xee0a0100 [ 1.821568] ehci-platform ee0a0100.usb: USB 2.0 started, EHCI 1.10 [ 1.828087] hub 1-0:1.0: USB hub found [ 1.831856] hub 1-0:1.0: 1 port detected [ 1.836044] ehci-platform ee0c0100.usb: EHCI Host Controller [ 1.841711] ehci-platform ee0c0100.usb: new USB bus registered, assigned bus number 2 [ 1.849592] ehci-platform ee0c0100.usb: irq 166, io mem 0xee0c0100 [ 1.869555] ehci-platform ee0c0100.usb: USB 2.0 started, EHCI 1.10 [ 1.875993] hub 2-0:1.0: USB hub found [ 1.879757] hub 2-0:1.0: 1 port detected [ 1.883910] ehci-orion: EHCI orion driver [ 1.888098] ehci-exynos: EHCI EXYNOS driver [ 1.892371] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 1.898562] ohci-pci: OHCI PCI platform driver [ 1.903033] ohci-platform: OHCI generic platform driver [ 1.908496] ohci-platform ee0a0000.usb: Generic Platform OHCI controller [ 1.915209] ohci-platform ee0a0000.usb: new USB bus registered, assigned bus number 3 [ 1.923072] ohci-platform ee0a0000.usb: irq 165, io mem 0xee0a0000 [ 2.016534] hub 3-0:1.0: USB hub found [ 2.020298] hub 3-0:1.0: 1 port detected [ 2.024438] ohci-platform ee0c0000.usb: Generic Platform OHCI controller [ 2.031147] ohci-platform ee0c0000.usb: new USB bus registered, assigned bus number 4 [ 2.039026] ohci-platform ee0c0000.usb: irq 166, io mem 0xee0c0000 [ 2.132519] hub 4-0:1.0: USB hub found [ 2.136281] hub 4-0:1.0: 1 port detected [ 2.140417] ohci-exynos: OHCI EXYNOS driver [ 2.145110] xhci-hcd ee000000.usb: xHCI Host Controller [ 2.150344] xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 5 [ 2.157782] xhci-hcd ee000000.usb: Direct firmware load for r8a779x_usb3_v3.dlmem failed with error -2 [ 2.167098] xhci-hcd ee000000.usb: can't setup: -2 [ 2.171895] xhci-hcd ee000000.usb: USB bus 5 deregistered [ 2.177324] xhci-hcd: probe of ee000000.usb failed with error -2 [ 2.183655] usbcore: registered new interface driver usb-storage [ 2.192417] i2c /dev entries driver [ 2.203824] cs2000-cp 2-004f: revision - C1 [ 2.208051] i2c-rcar e6510000.i2c: probed [ 2.212397] pca953x 4-0020: 4-0020 supply vcc not found, using dummy regulator [ 2.220399] i2c-rcar e66d8000.i2c: probed [ 2.231022] rcar_gen3_thermal e6198000.thermal: TSC0: Loaded 1 trip points [ 2.242049] rcar_gen3_thermal e6198000.thermal: TSC1: Loaded 1 trip points [ 2.253051] rcar_gen3_thermal e6198000.thermal: TSC2: Loaded 2 trip points [ 2.262525] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 1499999 KHz [ 2.269954] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 1500000 KHz [ 2.278842] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1199999 KHz [ 2.286537] cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz [ 2.295864] sdhci: Secure Digital Host Controller Interface driver [ 2.302048] sdhci: Copyright(c) Pierre Ossman [ 2.307021] renesas_sdhi_internal_dmac ee100000.sd: Got CD GPIO [ 2.312959] renesas_sdhi_internal_dmac ee100000.sd: Got WP GPIO [ 2.389858] renesas_sdhi_internal_dmac ee140000.sd: IRQ index 1 not found [ 2.396653] renesas_sdhi_internal_dmac ee140000.sd: mmc0 base at 0xee140000 max clock rate 200 MHz [ 2.405964] renesas_sdhi_internal_dmac ee160000.sd: Got CD GPIO [ 2.411904] renesas_sdhi_internal_dmac ee160000.sd: Got WP GPIO [ 2.418211] Synopsys Designware Multimedia Card Interface Driver [ 2.425178] sdhci-pltfm: SDHCI platform and OF driver helper [ 2.432494] ledtrig-cpu: registered to indicate activity on CPUs [ 2.439571] usbcore: registered new interface driver usbhid [ 2.445144] usbhid: USB HID core driver [ 2.452453] NET: Registered protocol family 17 [ 2.457011] 9pnet: Installing 9P2000 support [ 2.461319] Key type dns_resolver registered [ 2.465799] registered taskstats version 1 [ 2.469898] Loading compiled-in X.509 certificates [ 2.482897] renesas_irqc e61c0000.interrupt-controller: driving 6 irqs [ 2.495600] bd9571mwv 7-0030: Device: BD9571MWV rev. 1 [ 2.515031] mmc0: new HS400 MMC card at address 0001 [ 2.520418] mmcblk0: mmc0:0001 BGSD3R 29.1 GiB [ 2.525131] mmcblk0boot0: mmc0:0001 BGSD3R partition 1 16.0 MiB [ 2.529575] ehci-platform ee080100.usb: EHCI Host Controller [ 2.531207] mmcblk0boot1: mmc0:0001 BGSD3R partition 2 16.0 MiB [ 2.536718] ehci-platform ee080100.usb: new USB bus registered, assigned bus number 5 [ 2.542734] mmcblk0rpmb: mmc0:0001 BGSD3R partition 3 4.00 MiB, chardev (237:0) [ 2.550499] ehci-platform ee080100.usb: irq 164, io mem 0xee080100 [ 2.558357] mmcblk0: p1 [ 2.577560] ehci-platform ee080100.usb: USB 2.0 started, EHCI 1.10 [ 2.584084] hub 5-0:1.0: USB hub found [ 2.587851] hub 5-0:1.0: 1 port detected [ 2.592849] ohci-platform ee080000.usb: Generic Platform OHCI controller [ 2.599569] ohci-platform ee080000.usb: new USB bus registered, assigned bus number 6 [ 2.607446] ohci-platform ee080000.usb: irq 164, io mem 0xee080000 [ 2.704528] hub 6-0:1.0: USB hub found [ 2.708295] hub 6-0:1.0: 1 port detected [ 2.713342] renesas_sdhi_internal_dmac ee100000.sd: Got CD GPIO [ 2.719283] renesas_sdhi_internal_dmac ee100000.sd: Got WP GPIO [ 2.795713] renesas_sdhi_internal_dmac ee100000.sd: IRQ index 1 not found [ 2.802509] renesas_sdhi_internal_dmac ee100000.sd: mmc1 base at 0xee100000 max clock rate 200 MHz [ 2.812389] renesas_sdhi_internal_dmac ee160000.sd: Got CD GPIO [ 2.818337] renesas_sdhi_internal_dmac ee160000.sd: Got WP GPIO [ 2.894683] renesas_sdhi_internal_dmac ee160000.sd: IRQ index 1 not found [ 2.901477] renesas_sdhi_internal_dmac ee160000.sd: mmc2 base at 0xee160000 max clock rate 200 MHz [ 2.914096] rcar-dmac e6700000.dma-controller: ignoring dependency for device, assuming no driver [ 2.925001] rcar-dmac e7300000.dma-controller: ignoring dependency for device, assuming no driver [ 2.935788] rcar-dmac e7310000.dma-controller: ignoring dependency for device, assuming no driver [ 2.946621] rcar-dmac ec700000.dma-controller: ignoring dependency for device, assuming no driver [ 2.957413] rcar-dmac ec720000.dma-controller: ignoring dependency for device, assuming no driver [ 2.968426] sata_rcar ee300000.sata: ignoring dependency for device, assuming no driver [ 2.976875] scsi host0: sata_rcar [ 2.980348] ata1: SATA max UDMA/133 irq 170 [ 2.985299] ravb e6800000.ethernet: ignoring dependency for device, assuming no driver [ 2.993512] libphy: ravb_mii: probed [ 2.998278] ravb e6800000.ethernet eth0: Base address at 0xe6800000, 2e:09:0a:00:83:ea, IRQ 116. [ 3.008624] input: keys as /devices/platform/keys/input/input0 [ 3.014713] hctosys: unable to open rtc device (rtc0) [ 3.096510] Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=175) [ 3.401564] ata1: link resume succeeded after 1 retries [ 3.513072] ata1: SATA link down (SStatus 0 SControl 300) [ 4.742059] ravb e6800000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off [ 4.773553] Sending DHCP requests .., [ 7.413975] random: fast init done [ 7.421550] OK [ 7.423320] IP-Config: Got DHCP answer from 192.168.44.74, my address is 192.168.44.104 [ 7.431336] IP-Config: Complete: [ 7.434568] device=eth0, hwaddr=2e:09:0a:00:83:ea, ipaddr=192.168.44.104, mask=255.255.255.0, gw=192.168.44.74 [ 7.445000] host=192.168.44.104, domain=shimoda-i7.org, nis-domain=(none) [ 7.452218] bootserver=192.168.44.74, rootserver=192.168.44.74, rootpath=/var/lib/tftpboot/aarch64/rootfs/buildroot [ 7.452220] nameserver0=192.168.44.74 [ 7.467553] SDHI0 Vcc: disabling [ 7.470782] SDHI3 Vcc: disabling [ 7.474008] SDHI0 VccQ: disabling [ 7.477316] SDHI3 VccQ: disabling [ 7.480632] ALSA device list: [ 7.483598] No soundcards found. [ 7.492496] VFS: Mounted root (nfs filesystem) on device 0:19. [ 7.498742] devtmpfs: mounted [ 7.504263] Freeing unused kernel memory: 4992K [ 7.513642] Run /sbin/init as init process [ 7.843871] Unable to handle kernel paging request at virtual address 0000000056000000 [ 7.851797] Mem abort info: [ 7.854589] ESR = 0x96000004 [ 7.857642] EC = 0x25: DABT (current EL), IL = 32 bits [ 7.862950] SET = 0, FnV = 0 [ 7.866001] EA = 0, S1PTW = 0 [ 7.869134] Data abort info: [ 7.872011] ISV = 0, ISS = 0x00000004 [ 7.875842] CM = 0, WnR = 0 [ 7.878806] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000774787000 [ 7.885242] [0000000056000000] pgd=0000000000000000 [ 7.890119] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 7.895684] Modules linked in: [ 7.898737] CPU: 2 PID: 1 Comm: systemd Not tainted 5.3.0-rc6-next-20190902-00001-g9709468 #48 [ 7.907340] Hardware name: Renesas Salvator-X board based on r8a7795 ES2.0+ (DT) [ 7.914729] pstate: 20000005 (nzCv daif -PAN -UAO) [ 7.919523] pc : dput+0x38/0x2e8 [ 7.922743] lr : dput+0x34/0x2e8 [ 7.925964] sp : ffff80001006bba0 [ 7.929270] x29: ffff80001006bba0 x28: ffff000735c98000 [ 7.934576] x27: 0000000000000000 x26: 0000000000000000 [ 7.939881] x25: 0000000056000000 x24: 0000000000004000 [ 7.945186] x23: 0000000000000001 x22: 0000000000080060 [ 7.950491] x21: 0000000000080040 x20: 0000000056000058 [ 7.955795] x19: 0000000056000000 x18: 0000000000000000 [ 7.961099] x17: 0000000000000000 x16: 0000000000000000 [ 7.966403] x15: 0000000000000000 x14: 0000000000000000 [ 7.971707] x13: 0000000000000000 x12: fefefefefefefeff [ 7.977011] x11: 0000ffffa01018b8 x10: 0000ffffa01018b8 [ 7.982315] x9 : 6bff3a3a375c19ff x8 : 00ffffa01018b800 [ 7.987620] x7 : 0000000000000000 x6 : 0000000000000000 [ 7.992924] x5 : 0000000000000064 x4 : 0000000c00000000 [ 7.998228] x3 : 0000000000000001 x2 : 0000000000000082 [ 8.003532] x1 : ffff000735c98000 x0 : 0000000000000001 [ 8.008838] Call trace: [ 8.011278] dput+0x38/0x2e8 [ 8.014155] terminate_walk+0xf4/0x120 [ 8.017897] path_lookupat+0xf8/0x1f8 [ 8.021553] filename_lookup+0x8c/0x160 [ 8.025382] user_path_at_empty+0x48/0x58 [ 8.029387] __arm64_sys_name_to_handle_at+0x64/0x2d0 [ 8.034435] el0_svc_common+0x68/0x178 [ 8.038177] el0_svc_handler+0x24/0x98 [ 8.041920] el0_svc+0x8/0xc [ 8.044798] Code: 72a00115 52800037 97fb26b4 91016274 (b9400260) [ 8.050895] ---[ end trace dd06490ec981282b ]--- [ 8.055966] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 8.063619] SMP: stopping secondary CPUs [ 8.067539] Kernel Offset: disabled [ 8.071021] CPU features: 0x0002,21006004 [ 8.075022] Memory Limit: none [ 8.078076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- Best regards, Yoshihiro Shimoda ^ permalink raw reply [flat|nested] 234+ messages in thread
* RE: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 8:53 ` Yoshihiro Shimoda 0 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-03 8:53 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block Hi, > From: David Howells, Sent: Friday, August 30, 2019 10:58 PM <snip> > diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > index 9063ede411ae..b8572e4d6a1b 100644 > --- a/drivers/usb/core/devio.c > +++ b/drivers/usb/core/devio.c > @@ -41,6 +41,7 @@ > #include <linux/dma-mapping.h> > #include <asm/byteorder.h> > #include <linux/moduleparam.h> > +#include <linux/watch_queue.h> > > #include "usb.h" > > @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) > } > } > > +#ifdef CONFIG_USB_NOTIFICATIONS > +static noinline void post_usb_notification(const char *devname, > + enum usb_notification_type subtype, > + u32 error) > +{ > + unsigned int gran = WATCH_LENGTH_GRANULARITY; > + unsigned int name_len, n_len; > + u64 id = 0; /* Might want to put a dev# here. */ > + > + struct { > + struct usb_notification n; > + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - > + (sizeof(struct usb_notification) - > + offsetof(struct usb_notification, name))]; > + } n; > + > + name_len = strlen(devname); > + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); > + n_len = round_up(offsetof(struct usb_notification, name) + name_len, > + gran) / gran; > + > + memset(&n, 0, sizeof(n)); > + memcpy(n.n.name, devname, n_len); > + > + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; > + n.n.watch.subtype = subtype; > + n.n.watch.info = n_len; > + n.n.error = error; > + n.n.name_len = name_len; > + > + post_device_notification(&n.n.watch, id); > +} > + > +void post_usb_device_notification(const struct usb_device *udev, > + enum usb_notification_type subtype, u32 error) > +{ > + post_usb_notification(dev_name(&udev->dev), subtype, error); > +} > + > +void post_usb_bus_notification(const struct usb_bus *ubus, This function's argument is struct usb_bus *, but ... > + enum usb_notification_type subtype, u32 error) > +{ > + post_usb_notification(ubus->bus_name, subtype, error); > +} > +#endif > + > static int usbdev_notify(struct notifier_block *self, > unsigned long action, void *dev) > { > switch (action) { > case USB_DEVICE_ADD: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > break; > case USB_DEVICE_REMOVE: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > + usbdev_remove(dev); > + break; > + case USB_BUS_ADD: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > + break; > + case USB_BUS_REMOVE: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > usbdev_remove(dev); this function calls usbdev_remove() with incorrect argument if the action is USB_BUS_REMOVE. So, this seems to cause the following issue [1] on my environment (R-Car H3 / r8a7795 on next-20190902) [2]. However, I have no idea how to fix the issue, so I report this issue at the first step. JFYI, even if I have reverted this patch on next-20190902, other issue appears [3]. [1] The following panic happened. [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd073] [ 0.000000] Linux version 5.3.0-rc6-next-20190902 (shimoda@shimoda-RB02198) (gcc version 7.4.1 20181213 [linaro-7.4-2019.02 revision 56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4] (Linaro GCC 7.4-2019.02)) #47 SMP PREEMPT Tue Sep 3 17:42:01 JST 2019 [ 0.000000] Machine model: Renesas Salvator-X board based on r8a7795 ES2.0+ [ 0.000000] printk: debug: ignoring loglevel setting. [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 32 MiB at 0x00000000be000000 [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000048000000-0x000000077fffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x77efdb800-0x77efdcfff] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000048000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x000000077fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000048000000-0x00000000bfffffff] [ 0.000000] node 0: [mem 0x0000000500000000-0x000000057fffffff] [ 0.000000] node 0: [mem 0x0000000600000000-0x000000067fffffff] [ 0.000000] node 0: [mem 0x0000000700000000-0x000000077fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000048000000-0x000000077fffffff] [ 0.000000] On node 0 totalpages: 2064384 [ 0.000000] DMA32 zone: 7680 pages used for memmap [ 0.000000] DMA32 zone: 0 pages reserved [ 0.000000] DMA32 zone: 491520 pages, LIFO batch:63 [ 0.000000] Normal zone: 24576 pages used for memmap [ 0.000000] Normal zone: 1572864 pages, LIFO batch:63 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.1 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] psci: SMC Calling Convention v1.1 [ 0.000000] percpu: Embedded 22 pages/cpu s52952 r8192 d28968 u90112 [ 0.000000] pcpu-alloc: s52952 r8192 d28968 u90112 alloc=22*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] CPU features: detected: EL2 vector hardening [ 0.000000] Speculative Store Bypass Disable mitigation not required [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 2032128 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: console=ttySC0,115200 ignore_loglevel consoleblank=0 rw root=/dev/nfs ip=dhcp [ 0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: mapped [mem 0xba000000-0xbe000000] (64MB) [ 0.000000] Memory: 7972368K/8257536K available (12092K kernel code, 1846K rwdata, 6320K rodata, 4992K init, 450K bss, 252400K reserved, 32768K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8. [ 0.000000] Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f102f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] random: get_random_bytes called from start_kernel+0x2f0/0x490 with crng_init=0 [ 0.000000] arch_timer: cp15 timer(s) running at 8.33MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1ec02923e, max_idle_ns: 440795202125 ns [ 0.000003] sched_clock: 56 bits at 8MHz, resolution 120ns, wraps every 2199023255496ns [ 0.000142] Console: colour dummy device 80x25 [ 0.000211] Calibrating delay loop (skipped), value calculated using timer frequency.. 16.66 BogoMIPS (lpj=33333) [ 0.000218] pid_max: default: 32768 minimum: 301 [ 0.000273] LSM: Security Framework initializing [ 0.000351] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.000397] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.023974] ASID allocator initialised with 32768 entries [ 0.031963] rcu: Hierarchical SRCU implementation. [ 0.041031] Detected Renesas R-Car Gen3 r8a7795 ES3.0 [ 0.042354] EFI services will not be available. [ 0.047989] smp: Bringing up secondary CPUs ... [ 0.080173] Detected PIPT I-cache on CPU1 [ 0.080213] CPU1: Booted secondary processor 0x0000000001 [0x411fd073] [ 0.112190] Detected PIPT I-cache on CPU2 [ 0.112210] CPU2: Booted secondary processor 0x0000000002 [0x411fd073] [ 0.144225] Detected PIPT I-cache on CPU3 [ 0.144244] CPU3: Booted secondary processor 0x0000000003 [0x411fd073] [ 0.176267] CPU features: detected: ARM erratum 845719 [ 0.176278] Detected VIPT I-cache on CPU4 [ 0.176316] CPU4: Booted secondary processor 0x0000000100 [0x410fd034] [ 0.208292] Detected VIPT I-cache on CPU5 [ 0.208316] CPU5: Booted secondary processor 0x0000000101 [0x410fd034] [ 0.240331] Detected VIPT I-cache on CPU6 [ 0.240354] CPU6: Booted secondary processor 0x0000000102 [0x410fd034] [ 0.272365] Detected VIPT I-cache on CPU7 [ 0.272389] CPU7: Booted secondary processor 0x0000000103 [0x410fd034] [ 0.272464] smp: Brought up 1 node, 8 CPUs [ 0.272484] SMP: Total of 8 processors activated. [ 0.272488] CPU features: detected: 32-bit EL0 Support [ 0.272493] CPU features: detected: CRC32 instructions [ 0.282612] CPU: All CPU(s) started at EL2 [ 0.282644] alternatives: patching kernel code [ 0.283676] devtmpfs: initialized [ 0.289458] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.289471] futex hash table entries: 2048 (order: 5, 131072 bytes, linear) [ 0.290163] pinctrl core: initialized pinctrl subsystem [ 0.291360] DMI not present or invalid. [ 0.291607] NET: Registered protocol family 16 [ 0.292388] DMA: preallocated 256 KiB pool for atomic allocations [ 0.292399] audit: initializing netlink subsys (disabled) [ 0.292539] audit: type=2000 audit(0.292:1): state=initialized audit_enabled=0 res=1 [ 0.293573] cpuidle: using governor menu [ 0.293733] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.294678] Serial: AMBA PL011 UART driver [ 0.296912] sh-pfc e6060000.pin-controller: IRQ index 0 not found [ 0.297125] sh-pfc e6060000.pin-controller: r8a77951_pfc support registered [ 0.317432] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 0.317439] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages [ 0.317443] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.317447] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages [ 0.319199] cryptd: max_cpu_qlen set to 1000 [ 0.322091] ACPI: Interpreter disabled. [ 0.325627] iommu: Default domain type: Translated [ 0.325823] vgaarb: loaded [ 0.326011] SCSI subsystem initialized [ 0.326113] libata version 3.00 loaded. [ 0.326243] usbcore: registered new interface driver usbfs [ 0.326264] usbcore: registered new interface driver hub [ 0.326307] usbcore: registered new device driver usb [ 0.327255] i2c-sh_mobile e60b0000.i2c: I2C adapter 7, bus speed 400000 Hz [ 0.327560] pps_core: LinuxPPS API ver. 1 registered [ 0.327564] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> [ 0.327573] PTP clock support registered [ 0.327701] EDAC MC: Ver: 3.0.0 [ 0.328991] FPGA manager framework [ 0.329031] Advanced Linux Sound Architecture Driver Initialized. [ 0.329497] clocksource: Switched to clocksource arch_sys_counter [ 0.329639] VFS: Disk quotas dquot_6.6.0 [ 0.329682] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.329800] pnp: PnP ACPI: disabled [ 0.332764] thermal_sys: Registered thermal governor 'step_wise' [ 0.332767] thermal_sys: Registered thermal governor 'power_allocator' [ 0.333270] NET: Registered protocol family 2 [ 0.333558] tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear) [ 0.333624] TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear) [ 0.333903] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 0.334489] TCP: Hash tables configured (established 65536 bind 65536) [ 0.334606] UDP hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.334714] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.334929] NET: Registered protocol family 1 [ 0.335290] RPC: Registered named UNIX socket transport module. [ 0.335295] RPC: Registered udp transport module. [ 0.335299] RPC: Registered tcp transport module. [ 0.335302] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.335311] PCI: CLS 0 bytes, default 64 [ 0.336141] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available [ 0.336377] hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available [ 0.336799] kvm [1]: IPA Size Limit: 40bits [ 0.337273] kvm [1]: vgic interrupt IRQ1 [ 0.337415] kvm [1]: Hyp mode initialized successfully [ 0.341775] Initialise system trusted keyrings [ 0.341864] workingset: timestamp_bits=44 max_order=21 bucket_order=0 [ 0.345076] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 0.345515] NFS: Registering the id_resolver key type [ 0.345532] Key type id_resolver registered [ 0.345535] Key type id_legacy registered [ 0.345544] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 0.345638] 9p: Installing v9fs 9p2000 file system support [ 0.354995] Key type asymmetric registered [ 0.355001] Asymmetric key parser 'x509' registered [ 0.355027] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245) [ 0.355032] io scheduler mq-deadline registered [ 0.355036] io scheduler kyber registered [ 0.359639] phy_rcar_gen3_usb2 ee0a0200.usb-phy: IRQ index 0 not found [ 0.360346] phy_rcar_gen3_usb2 ee0c0200.usb-phy: IRQ index 0 not found [ 0.366010] gpio_rcar e6050000.gpio: driving 16 GPIOs [ 0.366187] gpio_rcar e6051000.gpio: driving 29 GPIOs [ 0.366348] gpio_rcar e6052000.gpio: driving 15 GPIOs [ 0.366504] gpio_rcar e6053000.gpio: driving 16 GPIOs [ 0.366663] gpio_rcar e6054000.gpio: driving 18 GPIOs [ 0.366816] gpio_rcar e6055000.gpio: driving 26 GPIOs [ 0.366973] gpio_rcar e6055400.gpio: driving 32 GPIOs [ 0.367126] gpio_rcar e6055800.gpio: driving 4 GPIOs [ 0.368571] rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges: [ 0.368596] rcar-pcie fe000000.pcie: IO 0xfe100000..0xfe1fffff -> 0x00000000 [ 0.368613] rcar-pcie fe000000.pcie: MEM 0xfe200000..0xfe3fffff -> 0xfe200000 [ 0.368626] rcar-pcie fe000000.pcie: MEM 0x30000000..0x37ffffff -> 0x30000000 [ 0.368635] rcar-pcie fe000000.pcie: MEM 0x38000000..0x3fffffff -> 0x38000000 [ 0.433003] rcar-pcie fe000000.pcie: PCIe link down [ 0.433148] rcar-pcie ee800000.pcie: host bridge /soc/pcie@ee800000 ranges: [ 0.433165] rcar-pcie ee800000.pcie: IO 0xee900000..0xee9fffff -> 0x00000000 [ 0.433179] rcar-pcie ee800000.pcie: MEM 0xeea00000..0xeebfffff -> 0xeea00000 [ 0.433191] rcar-pcie ee800000.pcie: MEM 0xc0000000..0xc7ffffff -> 0xc0000000 [ 0.433200] rcar-pcie ee800000.pcie: MEM 0xc8000000..0xcfffffff -> 0xc8000000 [ 0.496985] rcar-pcie ee800000.pcie: PCIe link down [ 0.498893] EINJ: ACPI disabled. [ 0.510430] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.512246] SuperH (H)SCI(F) driver initialized [ 0.512568] sh-sci e6550000.serial: IRQ index 1 not found [ 0.512577] sh-sci e6550000.serial: IRQ index 2 not found [ 0.512584] sh-sci e6550000.serial: IRQ index 3 not found [ 0.512591] sh-sci e6550000.serial: IRQ index 4 not found [ 0.512597] sh-sci e6550000.serial: IRQ index 5 not found [ 0.512647] e6550000.serial: ttySC1 at MMIO 0xe6550000 (irq = 34, base_baud = 0) is a hscif [ 0.513065] sh-sci e6e88000.serial: IRQ index 1 not found [ 0.513073] sh-sci e6e88000.serial: IRQ index 2 not found [ 0.513079] sh-sci e6e88000.serial: IRQ index 3 not found [ 0.513086] sh-sci e6e88000.serial: IRQ index 4 not found [ 0.513092] sh-sci e6e88000.serial: IRQ index 5 not found [ 0.513119] e6e88000.serial: ttySC0 at MMIO 0xe6e88000 (irq = 119, base_baud = 0) is a scif [ 1.655695] printk: console [ttySC0] enabled [ 1.660706] msm_serial: driver initialized [ 1.671544] loop: module loaded [ 1.679482] libphy: Fixed MDIO Bus: probed [ 1.683719] tun: Universal TUN/TAP device driver, 1.6 [ 1.689559] thunder_xcv, ver 1.0 [ 1.692805] thunder_bgx, ver 1.0 [ 1.696052] nicpf, ver 1.0 [ 1.699373] hclge is initializing [ 1.702688] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version [ 1.709907] hns3: Copyright (c) 2017 Huawei Corporation. [ 1.715242] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 1.721073] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 1.727012] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k [ 1.733971] igb: Copyright (c) 2007-2014 Intel Corporation. [ 1.739557] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.4.0-k [ 1.747383] igbvf: Copyright (c) 2009 - 2012 Intel Corporation. [ 1.753636] sky2: driver version 1.30 [ 1.758264] VFIO - User Level meta-driver version: 0.3 [ 1.764783] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 1.771320] ehci-pci: EHCI PCI platform driver [ 1.775780] ehci-platform: EHCI generic platform driver [ 1.781335] ehci-platform ee0a0100.usb: EHCI Host Controller [ 1.787016] ehci-platform ee0a0100.usb: new USB bus registered, assigned bus number 1 [ 1.794935] ehci-platform ee0a0100.usb: irq 165, io mem 0xee0a0100 [ 1.813507] ehci-platform ee0a0100.usb: USB 2.0 started, EHCI 1.10 [ 1.820044] hub 1-0:1.0: USB hub found [ 1.823828] hub 1-0:1.0: 1 port detected [ 1.828017] ehci-platform ee0c0100.usb: EHCI Host Controller [ 1.833684] ehci-platform ee0c0100.usb: new USB bus registered, assigned bus number 2 [ 1.841560] ehci-platform ee0c0100.usb: irq 166, io mem 0xee0c0100 [ 1.861506] ehci-platform ee0c0100.usb: USB 2.0 started, EHCI 1.10 [ 1.867940] hub 2-0:1.0: USB hub found [ 1.871704] hub 2-0:1.0: 1 port detected [ 1.875860] ehci-orion: EHCI orion driver [ 1.880049] ehci-exynos: EHCI EXYNOS driver [ 1.884320] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 1.890509] ohci-pci: OHCI PCI platform driver [ 1.894978] ohci-platform: OHCI generic platform driver [ 1.900444] ohci-platform ee0a0000.usb: Generic Platform OHCI controller [ 1.907159] ohci-platform ee0a0000.usb: new USB bus registered, assigned bus number 3 [ 1.915025] ohci-platform ee0a0000.usb: irq 165, io mem 0xee0a0000 [ 2.008477] hub 3-0:1.0: USB hub found [ 2.012244] hub 3-0:1.0: 1 port detected [ 2.016388] ohci-platform ee0c0000.usb: Generic Platform OHCI controller [ 2.023097] ohci-platform ee0c0000.usb: new USB bus registered, assigned bus number 4 [ 2.030977] ohci-platform ee0c0000.usb: irq 166, io mem 0xee0c0000 [ 2.124457] hub 4-0:1.0: USB hub found [ 2.128220] hub 4-0:1.0: 1 port detected [ 2.132361] ohci-exynos: OHCI EXYNOS driver [ 2.137069] xhci-hcd ee000000.usb: xHCI Host Controller [ 2.142305] xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 5 [ 2.149748] xhci-hcd ee000000.usb: Direct firmware load for r8a779x_usb3_v3.dlmem failed with error -2 [ 2.159063] xhci-hcd ee000000.usb: can't setup: -2 [ 2.163861] xhci-hcd ee000000.usb: USB bus 5 deregistered [ 2.169266] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [ 2.178042] Mem abort info: [ 2.180828] ESR = 0x96000004 [ 2.183876] EC = 0x25: DABT (current EL), IL = 32 bits [ 2.189179] SET = 0, FnV = 0 [ 2.192226] EA = 0, S1PTW = 0 [ 2.195358] Data abort info: [ 2.198231] ISV = 0, ISS = 0x00000004 [ 2.202058] CM = 0, WnR = 0 [ 2.205019] [0000000000000020] user address but active_mm is swapper [ 2.211366] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2.216930] Modules linked in: [ 2.219981] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc6-next-20190902 #47 [ 2.227456] Hardware name: Renesas Salvator-X board based on r8a7795 ES2.0+ (DT) [ 2.234844] pstate: a0000085 (NzCv daIf -PAN -UAO) [ 2.239638] pc : _raw_write_lock+0x68/0x288 [ 2.243819] lr : destroy_async+0x20/0xb0 [ 2.247733] sp : ffff80001006b9b0 [ 2.251040] x29: ffff80001006b9b0 x28: ffff800011186fd8 [ 2.256345] x27: ffff800011186fc0 x26: 00000000ffffffed [ 2.261650] x25: ffff8000118ff570 x24: ffff8000118ff000 [ 2.266955] x23: 0000000000000004 x22: ffff800011900000 [ 2.272259] x21: 0000000000000020 x20: 0000000000000028 [ 2.277564] x19: 0000000000000000 x18: 0000000000000005 [ 2.282868] x17: 0000000000000020 x16: ffff800010d164b0 [ 2.288172] x15: ffff8000118ff6e8 x14: ffff000735867958 [ 2.293476] x13: 0000000000000000 x12: ffff8000118ff6e8 [ 2.298779] x11: ffff000735867908 x10: 0000000000000040 [ 2.304083] x9 : ffff8000118ff6f0 x8 : ffff8000118ff6e8 [ 2.309388] x7 : ffff000735867958 x6 : 0000000000000000 [ 2.314691] x5 : 0000000000000001 x4 : 0000000000000000 [ 2.319995] x3 : 0000000000000020 x2 : 0000000000000001 [ 2.325299] x1 : 0000000000000000 x0 : 0000000000000001 [ 2.330604] Call trace: [ 2.333045] _raw_write_lock+0x68/0x288 [ 2.336874] destroy_async+0x20/0xb0 [ 2.340443] usbdev_remove+0x3c/0xc0 [ 2.344011] usbdev_notify+0x20/0x38 [ 2.347583] notifier_call_chain+0x54/0x98 [ 2.351672] blocking_notifier_call_chain+0x48/0x70 [ 2.356543] usb_notify_remove_bus+0x1c/0x28 [ 2.360808] usb_deregister_bus+0x58/0x68 [ 2.364811] usb_add_hcd+0x234/0x730 [ 2.368381] xhci_plat_probe+0x4ec/0x650 [ 2.372302] platform_drv_probe+0x50/0xa0 [ 2.376305] really_probe+0xdc/0x350 [ 2.379874] driver_probe_device+0x58/0x100 [ 2.384050] device_driver_attach+0x6c/0x90 [ 2.388226] __driver_attach+0x84/0xc8 [ 2.391968] bus_for_each_dev+0x74/0xc8 [ 2.395796] driver_attach+0x20/0x28 [ 2.399365] bus_add_driver+0x148/0x1f0 [ 2.403193] driver_register+0x60/0x110 [ 2.407022] __platform_driver_register+0x40/0x48 [ 2.411723] xhci_plat_init+0x2c/0x34 [ 2.415380] do_one_initcall+0x5c/0x1b0 [ 2.419213] kernel_init_freeable+0x1a4/0x24c [ 2.423564] kernel_init+0x10/0x108 [ 2.427045] ret_from_fork+0x10/0x18 [ 2.430617] Code: 97d3f2b6 a8c17bfd d65f03c0 f9800071 (885ffc60) [ 2.436717] ---[ end trace 33e4fb349eb48047 ]--- [ 2.441345] note: swapper/0[1] exited with preempt_count 1 [ 2.446846] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 2.454497] SMP: stopping secondary CPUs [ 2.458416] Kernel Offset: disabled [ 2.461898] CPU features: 0x0002,21006004 [ 2.465899] Memory Limit: none [ 2.468950] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- [2] I'm using defconfig on arch/arm64 and disable CONFIG_FW_LOADER_USER_HELPER. [3] The following panic happened when I reverted the commit ef9cc255c9539288f119156412d23a4b785f3599 on next-20190902. [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd073] [ 0.000000] Linux version 5.3.0-rc6-next-20190902-00001-g9709468 (shimoda@shimoda-RB02198) (gcc version 7.4.1 20181213 [linaro-7.4-2019.02 revision 56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4] (Linaro GCC 7.4-2019.02)) #48 SMP PREEMPT Tue Sep 3 17:46:54 JST 2019 [ 0.000000] Machine model: Renesas Salvator-X board based on r8a7795 ES2.0+ [ 0.000000] printk: debug: ignoring loglevel setting. [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 32 MiB at 0x00000000be000000 [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000048000000-0x000000077fffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x77efdb800-0x77efdcfff] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000048000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x000000077fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000048000000-0x00000000bfffffff] [ 0.000000] node 0: [mem 0x0000000500000000-0x000000057fffffff] [ 0.000000] node 0: [mem 0x0000000600000000-0x000000067fffffff] [ 0.000000] node 0: [mem 0x0000000700000000-0x000000077fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000048000000-0x000000077fffffff] [ 0.000000] On node 0 totalpages: 2064384 [ 0.000000] DMA32 zone: 7680 pages used for memmap [ 0.000000] DMA32 zone: 0 pages reserved [ 0.000000] DMA32 zone: 491520 pages, LIFO batch:63 [ 0.000000] Normal zone: 24576 pages used for memmap [ 0.000000] Normal zone: 1572864 pages, LIFO batch:63 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.1 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] psci: SMC Calling Convention v1.1 [ 0.000000] percpu: Embedded 22 pages/cpu s52952 r8192 d28968 u90112 [ 0.000000] pcpu-alloc: s52952 r8192 d28968 u90112 alloc=22*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] CPU features: detected: EL2 vector hardening [ 0.000000] Speculative Store Bypass Disable mitigation not required [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 2032128 [ 0.000000] Policy zone: Normal [ 0.000000] Kernel command line: console=ttySC0,115200 ignore_loglevel consoleblank=0 rw root=/dev/nfs ip=dhcp [ 0.000000] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.000000] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: mapped [mem 0xba000000-0xbe000000] (64MB) [ 0.000000] Memory: 7972368K/8257536K available (12092K kernel code, 1846K rwdata, 6320K rodata, 4992K init, 450K bss, 252400K reserved, 32768K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=8. [ 0.000000] Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f102f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] random: get_random_bytes called from start_kernel+0x2f0/0x490 with crng_init=0 [ 0.000000] arch_timer: cp15 timer(s) running at 8.33MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1ec02923e, max_idle_ns: 440795202125 ns [ 0.000002] sched_clock: 56 bits at 8MHz, resolution 120ns, wraps every 2199023255496ns [ 0.000141] Console: colour dummy device 80x25 [ 0.000207] Calibrating delay loop (skipped), value calculated using timer frequency.. 16.66 BogoMIPS (lpj=33333) [ 0.000215] pid_max: default: 32768 minimum: 301 [ 0.000270] LSM: Security Framework initializing [ 0.000352] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.000398] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.023978] ASID allocator initialised with 32768 entries [ 0.031968] rcu: Hierarchical SRCU implementation. [ 0.041040] Detected Renesas R-Car Gen3 r8a7795 ES3.0 [ 0.042364] EFI services will not be available. [ 0.047996] smp: Bringing up secondary CPUs ... [ 0.080183] Detected PIPT I-cache on CPU1 [ 0.080222] CPU1: Booted secondary processor 0x0000000001 [0x411fd073] [ 0.112195] Detected PIPT I-cache on CPU2 [ 0.112216] CPU2: Booted secondary processor 0x0000000002 [0x411fd073] [ 0.144232] Detected PIPT I-cache on CPU3 [ 0.144253] CPU3: Booted secondary processor 0x0000000003 [0x411fd073] [ 0.176276] CPU features: detected: ARM erratum 845719 [ 0.176286] Detected VIPT I-cache on CPU4 [ 0.176324] CPU4: Booted secondary processor 0x0000000100 [0x410fd034] [ 0.208297] Detected VIPT I-cache on CPU5 [ 0.208321] CPU5: Booted secondary processor 0x0000000101 [0x410fd034] [ 0.240338] Detected VIPT I-cache on CPU6 [ 0.240361] CPU6: Booted secondary processor 0x0000000102 [0x410fd034] [ 0.272375] Detected VIPT I-cache on CPU7 [ 0.272398] CPU7: Booted secondary processor 0x0000000103 [0x410fd034] [ 0.272473] smp: Brought up 1 node, 8 CPUs [ 0.272492] SMP: Total of 8 processors activated. [ 0.272497] CPU features: detected: 32-bit EL0 Support [ 0.272502] CPU features: detected: CRC32 instructions [ 0.282749] CPU: All CPU(s) started at EL2 [ 0.282777] alternatives: patching kernel code [ 0.283815] devtmpfs: initialized [ 0.289644] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.289659] futex hash table entries: 2048 (order: 5, 131072 bytes, linear) [ 0.290353] pinctrl core: initialized pinctrl subsystem [ 0.291538] DMI not present or invalid. [ 0.291777] NET: Registered protocol family 16 [ 0.292561] DMA: preallocated 256 KiB pool for atomic allocations [ 0.292571] audit: initializing netlink subsys (disabled) [ 0.292709] audit: type=2000 audit(0.292:1): state=initialized audit_enabled=0 res=1 [ 0.293743] cpuidle: using governor menu [ 0.293898] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.294862] Serial: AMBA PL011 UART driver [ 0.297061] sh-pfc e6060000.pin-controller: IRQ index 0 not found [ 0.297280] sh-pfc e6060000.pin-controller: r8a77951_pfc support registered [ 0.317490] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 0.317498] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages [ 0.317503] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.317506] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages [ 0.319278] cryptd: max_cpu_qlen set to 1000 [ 0.322162] ACPI: Interpreter disabled. [ 0.325707] iommu: Default domain type: Translated [ 0.325909] vgaarb: loaded [ 0.326100] SCSI subsystem initialized [ 0.326199] libata version 3.00 loaded. [ 0.326333] usbcore: registered new interface driver usbfs [ 0.326353] usbcore: registered new interface driver hub [ 0.326395] usbcore: registered new device driver usb [ 0.327336] i2c-sh_mobile e60b0000.i2c: I2C adapter 7, bus speed 400000 Hz [ 0.327631] pps_core: LinuxPPS API ver. 1 registered [ 0.327636] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> [ 0.327644] PTP clock support registered [ 0.327770] EDAC MC: Ver: 3.0.0 [ 0.329047] FPGA manager framework [ 0.329089] Advanced Linux Sound Architecture Driver Initialized. [ 0.329548] clocksource: Switched to clocksource arch_sys_counter [ 0.329696] VFS: Disk quotas dquot_6.6.0 [ 0.329738] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.329855] pnp: PnP ACPI: disabled [ 0.332843] thermal_sys: Registered thermal governor 'step_wise' [ 0.332845] thermal_sys: Registered thermal governor 'power_allocator' [ 0.333337] NET: Registered protocol family 2 [ 0.333618] tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear) [ 0.333682] TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear) [ 0.333961] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 0.334551] TCP: Hash tables configured (established 65536 bind 65536) [ 0.334661] UDP hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.334769] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.334979] NET: Registered protocol family 1 [ 0.335329] RPC: Registered named UNIX socket transport module. [ 0.335333] RPC: Registered udp transport module. [ 0.335337] RPC: Registered tcp transport module. [ 0.335340] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.335349] PCI: CLS 0 bytes, default 64 [ 0.336179] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available [ 0.336413] hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available [ 0.336837] kvm [1]: IPA Size Limit: 40bits [ 0.337313] kvm [1]: vgic interrupt IRQ1 [ 0.337458] kvm [1]: Hyp mode initialized successfully [ 0.341834] Initialise system trusted keyrings [ 0.341924] workingset: timestamp_bits=44 max_order=21 bucket_order=0 [ 0.345148] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 0.345585] NFS: Registering the id_resolver key type [ 0.345602] Key type id_resolver registered [ 0.345606] Key type id_legacy registered [ 0.345615] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 0.345714] 9p: Installing v9fs 9p2000 file system support [ 0.354814] Key type asymmetric registered [ 0.354819] Asymmetric key parser 'x509' registered [ 0.354844] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245) [ 0.354849] io scheduler mq-deadline registered [ 0.354853] io scheduler kyber registered [ 0.359476] phy_rcar_gen3_usb2 ee0a0200.usb-phy: IRQ index 0 not found [ 0.360174] phy_rcar_gen3_usb2 ee0c0200.usb-phy: IRQ index 0 not found [ 0.365881] gpio_rcar e6050000.gpio: driving 16 GPIOs [ 0.366057] gpio_rcar e6051000.gpio: driving 29 GPIOs [ 0.366222] gpio_rcar e6052000.gpio: driving 15 GPIOs [ 0.366376] gpio_rcar e6053000.gpio: driving 16 GPIOs [ 0.366536] gpio_rcar e6054000.gpio: driving 18 GPIOs [ 0.366688] gpio_rcar e6055000.gpio: driving 26 GPIOs [ 0.366860] gpio_rcar e6055400.gpio: driving 32 GPIOs [ 0.367015] gpio_rcar e6055800.gpio: driving 4 GPIOs [ 0.368473] rcar-pcie fe000000.pcie: host bridge /soc/pcie@fe000000 ranges: [ 0.368498] rcar-pcie fe000000.pcie: IO 0xfe100000..0xfe1fffff -> 0x00000000 [ 0.368514] rcar-pcie fe000000.pcie: MEM 0xfe200000..0xfe3fffff -> 0xfe200000 [ 0.368527] rcar-pcie fe000000.pcie: MEM 0x30000000..0x37ffffff -> 0x30000000 [ 0.368536] rcar-pcie fe000000.pcie: MEM 0x38000000..0x3fffffff -> 0x38000000 [ 0.437037] rcar-pcie fe000000.pcie: PCIe link down [ 0.437187] rcar-pcie ee800000.pcie: host bridge /soc/pcie@ee800000 ranges: [ 0.437205] rcar-pcie ee800000.pcie: IO 0xee900000..0xee9fffff -> 0x00000000 [ 0.437218] rcar-pcie ee800000.pcie: MEM 0xeea00000..0xeebfffff -> 0xeea00000 [ 0.437230] rcar-pcie ee800000.pcie: MEM 0xc0000000..0xc7ffffff -> 0xc0000000 [ 0.437239] rcar-pcie ee800000.pcie: MEM 0xc8000000..0xcfffffff -> 0xc8000000 [ 0.501036] rcar-pcie ee800000.pcie: PCIe link down [ 0.502959] EINJ: ACPI disabled. [ 0.514458] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 0.516285] SuperH (H)SCI(F) driver initialized [ 0.516608] sh-sci e6550000.serial: IRQ index 1 not found [ 0.516616] sh-sci e6550000.serial: IRQ index 2 not found [ 0.516624] sh-sci e6550000.serial: IRQ index 3 not found [ 0.516630] sh-sci e6550000.serial: IRQ index 4 not found [ 0.516637] sh-sci e6550000.serial: IRQ index 5 not found [ 0.516685] e6550000.serial: ttySC1 at MMIO 0xe6550000 (irq = 34, base_baud = 0) is a hscif [ 0.517112] sh-sci e6e88000.serial: IRQ index 1 not found [ 0.517121] sh-sci e6e88000.serial: IRQ index 2 not found [ 0.517128] sh-sci e6e88000.serial: IRQ index 3 not found [ 0.517134] sh-sci e6e88000.serial: IRQ index 4 not found [ 0.517140] sh-sci e6e88000.serial: IRQ index 5 not found [ 0.517169] e6e88000.serial: ttySC0 at MMIO 0xe6e88000 (irq = 119, base_baud = 0) is a scif [ 1.661047] printk: console [ttySC0] enabled [ 1.666084] msm_serial: driver initialized [ 1.676874] loop: module loaded [ 1.684780] libphy: Fixed MDIO Bus: probed [ 1.689023] tun: Universal TUN/TAP device driver, 1.6 [ 1.694842] thunder_xcv, ver 1.0 [ 1.698099] thunder_bgx, ver 1.0 [ 1.701336] nicpf, ver 1.0 [ 1.704657] hclge is initializing [ 1.707971] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version [ 1.715189] hns3: Copyright (c) 2017 Huawei Corporation. [ 1.720525] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k [ 1.726355] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. [ 1.732292] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.6.0-k [ 1.739250] igb: Copyright (c) 2007-2014 Intel Corporation. [ 1.744836] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.4.0-k [ 1.752662] igbvf: Copyright (c) 2009 - 2012 Intel Corporation. [ 1.758910] sky2: driver version 1.30 [ 1.763530] VFIO - User Level meta-driver version: 0.3 [ 1.770067] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 1.776596] ehci-pci: EHCI PCI platform driver [ 1.781050] ehci-platform: EHCI generic platform driver [ 1.786609] ehci-platform ee0a0100.usb: EHCI Host Controller [ 1.792287] ehci-platform ee0a0100.usb: new USB bus registered, assigned bus number 1 [ 1.800200] ehci-platform ee0a0100.usb: irq 165, io mem 0xee0a0100 [ 1.821568] ehci-platform ee0a0100.usb: USB 2.0 started, EHCI 1.10 [ 1.828087] hub 1-0:1.0: USB hub found [ 1.831856] hub 1-0:1.0: 1 port detected [ 1.836044] ehci-platform ee0c0100.usb: EHCI Host Controller [ 1.841711] ehci-platform ee0c0100.usb: new USB bus registered, assigned bus number 2 [ 1.849592] ehci-platform ee0c0100.usb: irq 166, io mem 0xee0c0100 [ 1.869555] ehci-platform ee0c0100.usb: USB 2.0 started, EHCI 1.10 [ 1.875993] hub 2-0:1.0: USB hub found [ 1.879757] hub 2-0:1.0: 1 port detected [ 1.883910] ehci-orion: EHCI orion driver [ 1.888098] ehci-exynos: EHCI EXYNOS driver [ 1.892371] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 1.898562] ohci-pci: OHCI PCI platform driver [ 1.903033] ohci-platform: OHCI generic platform driver [ 1.908496] ohci-platform ee0a0000.usb: Generic Platform OHCI controller [ 1.915209] ohci-platform ee0a0000.usb: new USB bus registered, assigned bus number 3 [ 1.923072] ohci-platform ee0a0000.usb: irq 165, io mem 0xee0a0000 [ 2.016534] hub 3-0:1.0: USB hub found [ 2.020298] hub 3-0:1.0: 1 port detected [ 2.024438] ohci-platform ee0c0000.usb: Generic Platform OHCI controller [ 2.031147] ohci-platform ee0c0000.usb: new USB bus registered, assigned bus number 4 [ 2.039026] ohci-platform ee0c0000.usb: irq 166, io mem 0xee0c0000 [ 2.132519] hub 4-0:1.0: USB hub found [ 2.136281] hub 4-0:1.0: 1 port detected [ 2.140417] ohci-exynos: OHCI EXYNOS driver [ 2.145110] xhci-hcd ee000000.usb: xHCI Host Controller [ 2.150344] xhci-hcd ee000000.usb: new USB bus registered, assigned bus number 5 [ 2.157782] xhci-hcd ee000000.usb: Direct firmware load for r8a779x_usb3_v3.dlmem failed with error -2 [ 2.167098] xhci-hcd ee000000.usb: can't setup: -2 [ 2.171895] xhci-hcd ee000000.usb: USB bus 5 deregistered [ 2.177324] xhci-hcd: probe of ee000000.usb failed with error -2 [ 2.183655] usbcore: registered new interface driver usb-storage [ 2.192417] i2c /dev entries driver [ 2.203824] cs2000-cp 2-004f: revision - C1 [ 2.208051] i2c-rcar e6510000.i2c: probed [ 2.212397] pca953x 4-0020: 4-0020 supply vcc not found, using dummy regulator [ 2.220399] i2c-rcar e66d8000.i2c: probed [ 2.231022] rcar_gen3_thermal e6198000.thermal: TSC0: Loaded 1 trip points [ 2.242049] rcar_gen3_thermal e6198000.thermal: TSC1: Loaded 1 trip points [ 2.253051] rcar_gen3_thermal e6198000.thermal: TSC2: Loaded 2 trip points [ 2.262525] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 1499999 KHz [ 2.269954] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 1500000 KHz [ 2.278842] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1199999 KHz [ 2.286537] cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz [ 2.295864] sdhci: Secure Digital Host Controller Interface driver [ 2.302048] sdhci: Copyright(c) Pierre Ossman [ 2.307021] renesas_sdhi_internal_dmac ee100000.sd: Got CD GPIO [ 2.312959] renesas_sdhi_internal_dmac ee100000.sd: Got WP GPIO [ 2.389858] renesas_sdhi_internal_dmac ee140000.sd: IRQ index 1 not found [ 2.396653] renesas_sdhi_internal_dmac ee140000.sd: mmc0 base at 0xee140000 max clock rate 200 MHz [ 2.405964] renesas_sdhi_internal_dmac ee160000.sd: Got CD GPIO [ 2.411904] renesas_sdhi_internal_dmac ee160000.sd: Got WP GPIO [ 2.418211] Synopsys Designware Multimedia Card Interface Driver [ 2.425178] sdhci-pltfm: SDHCI platform and OF driver helper [ 2.432494] ledtrig-cpu: registered to indicate activity on CPUs [ 2.439571] usbcore: registered new interface driver usbhid [ 2.445144] usbhid: USB HID core driver [ 2.452453] NET: Registered protocol family 17 [ 2.457011] 9pnet: Installing 9P2000 support [ 2.461319] Key type dns_resolver registered [ 2.465799] registered taskstats version 1 [ 2.469898] Loading compiled-in X.509 certificates [ 2.482897] renesas_irqc e61c0000.interrupt-controller: driving 6 irqs [ 2.495600] bd9571mwv 7-0030: Device: BD9571MWV rev. 1 [ 2.515031] mmc0: new HS400 MMC card at address 0001 [ 2.520418] mmcblk0: mmc0:0001 BGSD3R 29.1 GiB [ 2.525131] mmcblk0boot0: mmc0:0001 BGSD3R partition 1 16.0 MiB [ 2.529575] ehci-platform ee080100.usb: EHCI Host Controller [ 2.531207] mmcblk0boot1: mmc0:0001 BGSD3R partition 2 16.0 MiB [ 2.536718] ehci-platform ee080100.usb: new USB bus registered, assigned bus number 5 [ 2.542734] mmcblk0rpmb: mmc0:0001 BGSD3R partition 3 4.00 MiB, chardev (237:0) [ 2.550499] ehci-platform ee080100.usb: irq 164, io mem 0xee080100 [ 2.558357] mmcblk0: p1 [ 2.577560] ehci-platform ee080100.usb: USB 2.0 started, EHCI 1.10 [ 2.584084] hub 5-0:1.0: USB hub found [ 2.587851] hub 5-0:1.0: 1 port detected [ 2.592849] ohci-platform ee080000.usb: Generic Platform OHCI controller [ 2.599569] ohci-platform ee080000.usb: new USB bus registered, assigned bus number 6 [ 2.607446] ohci-platform ee080000.usb: irq 164, io mem 0xee080000 [ 2.704528] hub 6-0:1.0: USB hub found [ 2.708295] hub 6-0:1.0: 1 port detected [ 2.713342] renesas_sdhi_internal_dmac ee100000.sd: Got CD GPIO [ 2.719283] renesas_sdhi_internal_dmac ee100000.sd: Got WP GPIO [ 2.795713] renesas_sdhi_internal_dmac ee100000.sd: IRQ index 1 not found [ 2.802509] renesas_sdhi_internal_dmac ee100000.sd: mmc1 base at 0xee100000 max clock rate 200 MHz [ 2.812389] renesas_sdhi_internal_dmac ee160000.sd: Got CD GPIO [ 2.818337] renesas_sdhi_internal_dmac ee160000.sd: Got WP GPIO [ 2.894683] renesas_sdhi_internal_dmac ee160000.sd: IRQ index 1 not found [ 2.901477] renesas_sdhi_internal_dmac ee160000.sd: mmc2 base at 0xee160000 max clock rate 200 MHz [ 2.914096] rcar-dmac e6700000.dma-controller: ignoring dependency for device, assuming no driver [ 2.925001] rcar-dmac e7300000.dma-controller: ignoring dependency for device, assuming no driver [ 2.935788] rcar-dmac e7310000.dma-controller: ignoring dependency for device, assuming no driver [ 2.946621] rcar-dmac ec700000.dma-controller: ignoring dependency for device, assuming no driver [ 2.957413] rcar-dmac ec720000.dma-controller: ignoring dependency for device, assuming no driver [ 2.968426] sata_rcar ee300000.sata: ignoring dependency for device, assuming no driver [ 2.976875] scsi host0: sata_rcar [ 2.980348] ata1: SATA max UDMA/133 irq 170 [ 2.985299] ravb e6800000.ethernet: ignoring dependency for device, assuming no driver [ 2.993512] libphy: ravb_mii: probed [ 2.998278] ravb e6800000.ethernet eth0: Base address at 0xe6800000, 2e:09:0a:00:83:ea, IRQ 116. [ 3.008624] input: keys as /devices/platform/keys/input/input0 [ 3.014713] hctosys: unable to open rtc device (rtc0) [ 3.096510] Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: attached PHY driver [Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=175) [ 3.401564] ata1: link resume succeeded after 1 retries [ 3.513072] ata1: SATA link down (SStatus 0 SControl 300) [ 4.742059] ravb e6800000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off [ 4.773553] Sending DHCP requests .., [ 7.413975] random: fast init done [ 7.421550] OK [ 7.423320] IP-Config: Got DHCP answer from 192.168.44.74, my address is 192.168.44.104 [ 7.431336] IP-Config: Complete: [ 7.434568] device=eth0, hwaddr=2e:09:0a:00:83:ea, ipaddr=192.168.44.104, mask=255.255.255.0, gw=192.168.44.74 [ 7.445000] host=192.168.44.104, domain=shimoda-i7.org, nis-domain=(none) [ 7.452218] bootserver=192.168.44.74, rootserver=192.168.44.74, rootpath=/var/lib/tftpboot/aarch64/rootfs/buildroot [ 7.452220] nameserver0=192.168.44.74 [ 7.467553] SDHI0 Vcc: disabling [ 7.470782] SDHI3 Vcc: disabling [ 7.474008] SDHI0 VccQ: disabling [ 7.477316] SDHI3 VccQ: disabling [ 7.480632] ALSA device list: [ 7.483598] No soundcards found. [ 7.492496] VFS: Mounted root (nfs filesystem) on device 0:19. [ 7.498742] devtmpfs: mounted [ 7.504263] Freeing unused kernel memory: 4992K [ 7.513642] Run /sbin/init as init process [ 7.843871] Unable to handle kernel paging request at virtual address 0000000056000000 [ 7.851797] Mem abort info: [ 7.854589] ESR = 0x96000004 [ 7.857642] EC = 0x25: DABT (current EL), IL = 32 bits [ 7.862950] SET = 0, FnV = 0 [ 7.866001] EA = 0, S1PTW = 0 [ 7.869134] Data abort info: [ 7.872011] ISV = 0, ISS = 0x00000004 [ 7.875842] CM = 0, WnR = 0 [ 7.878806] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000774787000 [ 7.885242] [0000000056000000] pgd=0000000000000000 [ 7.890119] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 7.895684] Modules linked in: [ 7.898737] CPU: 2 PID: 1 Comm: systemd Not tainted 5.3.0-rc6-next-20190902-00001-g9709468 #48 [ 7.907340] Hardware name: Renesas Salvator-X board based on r8a7795 ES2.0+ (DT) [ 7.914729] pstate: 20000005 (nzCv daif -PAN -UAO) [ 7.919523] pc : dput+0x38/0x2e8 [ 7.922743] lr : dput+0x34/0x2e8 [ 7.925964] sp : ffff80001006bba0 [ 7.929270] x29: ffff80001006bba0 x28: ffff000735c98000 [ 7.934576] x27: 0000000000000000 x26: 0000000000000000 [ 7.939881] x25: 0000000056000000 x24: 0000000000004000 [ 7.945186] x23: 0000000000000001 x22: 0000000000080060 [ 7.950491] x21: 0000000000080040 x20: 0000000056000058 [ 7.955795] x19: 0000000056000000 x18: 0000000000000000 [ 7.961099] x17: 0000000000000000 x16: 0000000000000000 [ 7.966403] x15: 0000000000000000 x14: 0000000000000000 [ 7.971707] x13: 0000000000000000 x12: fefefefefefefeff [ 7.977011] x11: 0000ffffa01018b8 x10: 0000ffffa01018b8 [ 7.982315] x9 : 6bff3a3a375c19ff x8 : 00ffffa01018b800 [ 7.987620] x7 : 0000000000000000 x6 : 0000000000000000 [ 7.992924] x5 : 0000000000000064 x4 : 0000000c00000000 [ 7.998228] x3 : 0000000000000001 x2 : 0000000000000082 [ 8.003532] x1 : ffff000735c98000 x0 : 0000000000000001 [ 8.008838] Call trace: [ 8.011278] dput+0x38/0x2e8 [ 8.014155] terminate_walk+0xf4/0x120 [ 8.017897] path_lookupat+0xf8/0x1f8 [ 8.021553] filename_lookup+0x8c/0x160 [ 8.025382] user_path_at_empty+0x48/0x58 [ 8.029387] __arm64_sys_name_to_handle_at+0x64/0x2d0 [ 8.034435] el0_svc_common+0x68/0x178 [ 8.038177] el0_svc_handler+0x24/0x98 [ 8.041920] el0_svc+0x8/0xc [ 8.044798] Code: 72a00115 52800037 97fb26b4 91016274 (b9400260) [ 8.050895] ---[ end trace dd06490ec981282b ]--- [ 8.055966] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b [ 8.063619] SMP: stopping secondary CPUs [ 8.067539] Kernel Offset: disabled [ 8.071021] CPU features: 0x0002,21006004 [ 8.075022] Memory Limit: none [ 8.078076] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]--- Best regards, Yoshihiro Shimoda ^ permalink raw reply [flat|nested] 234+ messages in thread
* RE: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 8:53 ` Yoshihiro Shimoda 0 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-03 8:53 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block SGksDQoNCj4gRnJvbTogRGF2aWQgSG93ZWxscywgU2VudDogRnJpZGF5LCBBdWd1c3QgMzAsIDIw MTkgMTA6NTggUE0NCjxzbmlwPg0KPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy91c2IvY29yZS9kZXZp by5jIGIvZHJpdmVycy91c2IvY29yZS9kZXZpby5jDQo+IGluZGV4IDkwNjNlZGU0MTFhZS4uYjg1 NzJlNGQ2YTFiIDEwMDY0NA0KPiAtLS0gYS9kcml2ZXJzL3VzYi9jb3JlL2RldmlvLmMNCj4gKysr IGIvZHJpdmVycy91c2IvY29yZS9kZXZpby5jDQo+IEBAIC00MSw2ICs0MSw3IEBADQo+ICAjaW5j bHVkZSA8bGludXgvZG1hLW1hcHBpbmcuaD4NCj4gICNpbmNsdWRlIDxhc20vYnl0ZW9yZGVyLmg+ DQo+ICAjaW5jbHVkZSA8bGludXgvbW9kdWxlcGFyYW0uaD4NCj4gKyNpbmNsdWRlIDxsaW51eC93 YXRjaF9xdWV1ZS5oPg0KPiANCj4gICNpbmNsdWRlICJ1c2IuaCINCj4gDQo+IEBAIC0yNjYwLDEz ICsyNjYxLDY4IEBAIHN0YXRpYyB2b2lkIHVzYmRldl9yZW1vdmUoc3RydWN0IHVzYl9kZXZpY2Ug KnVkZXYpDQo+ICAJfQ0KPiAgfQ0KPiANCj4gKyNpZmRlZiBDT05GSUdfVVNCX05PVElGSUNBVElP TlMNCj4gK3N0YXRpYyBub2lubGluZSB2b2lkIHBvc3RfdXNiX25vdGlmaWNhdGlvbihjb25zdCBj aGFyICpkZXZuYW1lLA0KPiArCQkJCQkgICBlbnVtIHVzYl9ub3RpZmljYXRpb25fdHlwZSBzdWJ0 eXBlLA0KPiArCQkJCQkgICB1MzIgZXJyb3IpDQo+ICt7DQo+ICsJdW5zaWduZWQgaW50IGdyYW4g PSBXQVRDSF9MRU5HVEhfR1JBTlVMQVJJVFk7DQo+ICsJdW5zaWduZWQgaW50IG5hbWVfbGVuLCBu X2xlbjsNCj4gKwl1NjQgaWQgPSAwOyAvKiBNaWdodCB3YW50IHRvIHB1dCBhIGRldiMgaGVyZS4g Ki8NCj4gKw0KPiArCXN0cnVjdCB7DQo+ICsJCXN0cnVjdCB1c2Jfbm90aWZpY2F0aW9uIG47DQo+ ICsJCWNoYXIgbW9yZV9uYW1lW1VTQl9OT1RJRklDQVRJT05fTUFYX05BTUVfTEVOIC0NCj4gKwkJ CSAgICAgICAoc2l6ZW9mKHN0cnVjdCB1c2Jfbm90aWZpY2F0aW9uKSAtDQo+ICsJCQkJb2Zmc2V0 b2Yoc3RydWN0IHVzYl9ub3RpZmljYXRpb24sIG5hbWUpKV07DQo+ICsJfSBuOw0KPiArDQo+ICsJ bmFtZV9sZW4gPSBzdHJsZW4oZGV2bmFtZSk7DQo+ICsJbmFtZV9sZW4gPSBtaW5fdChzaXplX3Qs IG5hbWVfbGVuLCBVU0JfTk9USUZJQ0FUSU9OX01BWF9OQU1FX0xFTik7DQo+ICsJbl9sZW4gPSBy b3VuZF91cChvZmZzZXRvZihzdHJ1Y3QgdXNiX25vdGlmaWNhdGlvbiwgbmFtZSkgKyBuYW1lX2xl biwNCj4gKwkJCSBncmFuKSAvIGdyYW47DQo+ICsNCj4gKwltZW1zZXQoJm4sIDAsIHNpemVvZihu KSk7DQo+ICsJbWVtY3B5KG4ubi5uYW1lLCBkZXZuYW1lLCBuX2xlbik7DQo+ICsNCj4gKwluLm4u d2F0Y2gudHlwZQkJPSBXQVRDSF9UWVBFX1VTQl9OT1RJRlk7DQo+ICsJbi5uLndhdGNoLnN1YnR5 cGUJPSBzdWJ0eXBlOw0KPiArCW4ubi53YXRjaC5pbmZvCQk9IG5fbGVuOw0KPiArCW4ubi5lcnJv cgkJPSBlcnJvcjsNCj4gKwluLm4ubmFtZV9sZW4JCT0gbmFtZV9sZW47DQo+ICsNCj4gKwlwb3N0 X2RldmljZV9ub3RpZmljYXRpb24oJm4ubi53YXRjaCwgaWQpOw0KPiArfQ0KPiArDQo+ICt2b2lk IHBvc3RfdXNiX2RldmljZV9ub3RpZmljYXRpb24oY29uc3Qgc3RydWN0IHVzYl9kZXZpY2UgKnVk ZXYsDQo+ICsJCQkJICBlbnVtIHVzYl9ub3RpZmljYXRpb25fdHlwZSBzdWJ0eXBlLCB1MzIgZXJy b3IpDQo+ICt7DQo+ICsJcG9zdF91c2Jfbm90aWZpY2F0aW9uKGRldl9uYW1lKCZ1ZGV2LT5kZXYp LCBzdWJ0eXBlLCBlcnJvcik7DQo+ICt9DQo+ICsNCj4gK3ZvaWQgcG9zdF91c2JfYnVzX25vdGlm aWNhdGlvbihjb25zdCBzdHJ1Y3QgdXNiX2J1cyAqdWJ1cywNCg0KVGhpcyBmdW5jdGlvbidzIGFy Z3VtZW50IGlzIHN0cnVjdCB1c2JfYnVzICosIGJ1dCAuLi4NCg0KPiArCQkJICAgICAgIGVudW0g dXNiX25vdGlmaWNhdGlvbl90eXBlIHN1YnR5cGUsIHUzMiBlcnJvcikNCj4gK3sNCj4gKwlwb3N0 X3VzYl9ub3RpZmljYXRpb24odWJ1cy0+YnVzX25hbWUsIHN1YnR5cGUsIGVycm9yKTsNCj4gK30N Cj4gKyNlbmRpZg0KPiArDQo+ICBzdGF0aWMgaW50IHVzYmRldl9ub3RpZnkoc3RydWN0IG5vdGlm aWVyX2Jsb2NrICpzZWxmLA0KPiAgCQkJICAgICAgIHVuc2lnbmVkIGxvbmcgYWN0aW9uLCB2b2lk ICpkZXYpDQo+ICB7DQo+ICAJc3dpdGNoIChhY3Rpb24pIHsNCj4gIAljYXNlIFVTQl9ERVZJQ0Vf QUREOg0KPiArCQlwb3N0X3VzYl9kZXZpY2Vfbm90aWZpY2F0aW9uKGRldiwgTk9USUZZX1VTQl9E RVZJQ0VfQURELCAwKTsNCj4gIAkJYnJlYWs7DQo+ICAJY2FzZSBVU0JfREVWSUNFX1JFTU9WRToN Cj4gKwkJcG9zdF91c2JfZGV2aWNlX25vdGlmaWNhdGlvbihkZXYsIE5PVElGWV9VU0JfREVWSUNF X1JFTU9WRSwgMCk7DQo+ICsJCXVzYmRldl9yZW1vdmUoZGV2KTsNCj4gKwkJYnJlYWs7DQo+ICsJ Y2FzZSBVU0JfQlVTX0FERDoNCj4gKwkJcG9zdF91c2JfYnVzX25vdGlmaWNhdGlvbihkZXYsIE5P VElGWV9VU0JfQlVTX0FERCwgMCk7DQo+ICsJCWJyZWFrOw0KPiArCWNhc2UgVVNCX0JVU19SRU1P VkU6DQo+ICsJCXBvc3RfdXNiX2J1c19ub3RpZmljYXRpb24oZGV2LCBOT1RJRllfVVNCX0JVU19S RU1PVkUsIDApOw0KPiAgCQl1c2JkZXZfcmVtb3ZlKGRldik7DQoNCnRoaXMgZnVuY3Rpb24gY2Fs bHMgdXNiZGV2X3JlbW92ZSgpIHdpdGggaW5jb3JyZWN0IGFyZ3VtZW50IGlmIHRoZSBhY3Rpb24N CmlzIFVTQl9CVVNfUkVNT1ZFLiBTbywgdGhpcyBzZWVtcyB0byBjYXVzZSB0aGUgZm9sbG93aW5n IGlzc3VlIFsxXSBvbg0KbXkgZW52aXJvbm1lbnQgKFItQ2FyIEgzIC8gcjhhNzc5NSBvbiBuZXh0 LTIwMTkwOTAyKSBbMl0uIEhvd2V2ZXIsIEkgaGF2ZQ0Kbm8gaWRlYSBob3cgdG8gZml4IHRoZSBp c3N1ZSwgc28gSSByZXBvcnQgdGhpcyBpc3N1ZSBhdCB0aGUgZmlyc3Qgc3RlcC4NCg0KSkZZSSwg ZXZlbiBpZiBJIGhhdmUgcmV2ZXJ0ZWQgdGhpcyBwYXRjaCBvbiBuZXh0LTIwMTkwOTAyLCBvdGhl ciBpc3N1ZQ0KYXBwZWFycyBbM10uDQoNClsxXSBUaGUgZm9sbG93aW5nIHBhbmljIGhhcHBlbmVk Lg0KWyAgICAwLjAwMDAwMF0gQm9vdGluZyBMaW51eCBvbiBwaHlzaWNhbCBDUFUgMHgwMDAwMDAw MDAwIFsweDQxMWZkMDczXQ0KWyAgICAwLjAwMDAwMF0gTGludXggdmVyc2lvbiA1LjMuMC1yYzYt bmV4dC0yMDE5MDkwMiAoc2hpbW9kYUBzaGltb2RhLVJCMDIxOTgpIChnY2MgdmVyc2lvbiA3LjQu MSAyMDE4MTIxMyBbbGluYXJvLTcuNC0yMDE5LjAyIHJldmlzaW9uIDU2ZWM2ZjZiOTljYzE2N2Zm MGMyZjhlMWEyZWVkMzNiMWVkYzg1ZDRdIChMaW5hcm8gR0NDIDcuNC0yMDE5LjAyKSkgIzQ3IFNN UCBQUkVFTVBUIFR1ZSBTZXAgMyAxNzo0MjowMSBKU1QgMjAxOQ0KWyAgICAwLjAwMDAwMF0gTWFj aGluZSBtb2RlbDogUmVuZXNhcyBTYWx2YXRvci1YIGJvYXJkIGJhc2VkIG9uIHI4YTc3OTUgRVMy LjArDQpbICAgIDAuMDAwMDAwXSBwcmludGs6IGRlYnVnOiBpZ25vcmluZyBsb2dsZXZlbCBzZXR0 aW5nLg0KWyAgICAwLjAwMDAwMF0gZWZpOiBHZXR0aW5nIEVGSSBwYXJhbWV0ZXJzIGZyb20gRkRU Og0KWyAgICAwLjAwMDAwMF0gZWZpOiBVRUZJIG5vdCBmb3VuZC4NClsgICAgMC4wMDAwMDBdIGNt YTogUmVzZXJ2ZWQgMzIgTWlCIGF0IDB4MDAwMDAwMDBiZTAwMDAwMA0KWyAgICAwLjAwMDAwMF0g TlVNQTogTm8gTlVNQSBjb25maWd1cmF0aW9uIGZvdW5kDQpbICAgIDAuMDAwMDAwXSBOVU1BOiBG YWtpbmcgYSBub2RlIGF0IFttZW0gMHgwMDAwMDAwMDQ4MDAwMDAwLTB4MDAwMDAwMDc3ZmZmZmZm Zl0NClsgICAgMC4wMDAwMDBdIE5VTUE6IE5PREVfREFUQSBbbWVtIDB4NzdlZmRiODAwLTB4Nzdl ZmRjZmZmXQ0KWyAgICAwLjAwMDAwMF0gWm9uZSByYW5nZXM6DQpbICAgIDAuMDAwMDAwXSAgIERN QTMyICAgIFttZW0gMHgwMDAwMDAwMDQ4MDAwMDAwLTB4MDAwMDAwMDBmZmZmZmZmZl0NClsgICAg MC4wMDAwMDBdICAgTm9ybWFsICAgW21lbSAweDAwMDAwMDAxMDAwMDAwMDAtMHgwMDAwMDAwNzdm ZmZmZmZmXQ0KWyAgICAwLjAwMDAwMF0gTW92YWJsZSB6b25lIHN0YXJ0IGZvciBlYWNoIG5vZGUN ClsgICAgMC4wMDAwMDBdIEVhcmx5IG1lbW9yeSBub2RlIHJhbmdlcw0KWyAgICAwLjAwMDAwMF0g ICBub2RlICAgMDogW21lbSAweDAwMDAwMDAwNDgwMDAwMDAtMHgwMDAwMDAwMGJmZmZmZmZmXQ0K WyAgICAwLjAwMDAwMF0gICBub2RlICAgMDogW21lbSAweDAwMDAwMDA1MDAwMDAwMDAtMHgwMDAw MDAwNTdmZmZmZmZmXQ0KWyAgICAwLjAwMDAwMF0gICBub2RlICAgMDogW21lbSAweDAwMDAwMDA2 MDAwMDAwMDAtMHgwMDAwMDAwNjdmZmZmZmZmXQ0KWyAgICAwLjAwMDAwMF0gICBub2RlICAgMDog W21lbSAweDAwMDAwMDA3MDAwMDAwMDAtMHgwMDAwMDAwNzdmZmZmZmZmXQ0KWyAgICAwLjAwMDAw MF0gSW5pdG1lbSBzZXR1cCBub2RlIDAgW21lbSAweDAwMDAwMDAwNDgwMDAwMDAtMHgwMDAwMDAw NzdmZmZmZmZmXQ0KWyAgICAwLjAwMDAwMF0gT24gbm9kZSAwIHRvdGFscGFnZXM6IDIwNjQzODQN ClsgICAgMC4wMDAwMDBdICAgRE1BMzIgem9uZTogNzY4MCBwYWdlcyB1c2VkIGZvciBtZW1tYXAN ClsgICAgMC4wMDAwMDBdICAgRE1BMzIgem9uZTogMCBwYWdlcyByZXNlcnZlZA0KWyAgICAwLjAw MDAwMF0gICBETUEzMiB6b25lOiA0OTE1MjAgcGFnZXMsIExJRk8gYmF0Y2g6NjMNClsgICAgMC4w MDAwMDBdICAgTm9ybWFsIHpvbmU6IDI0NTc2IHBhZ2VzIHVzZWQgZm9yIG1lbW1hcA0KWyAgICAw LjAwMDAwMF0gICBOb3JtYWwgem9uZTogMTU3Mjg2NCBwYWdlcywgTElGTyBiYXRjaDo2Mw0KWyAg ICAwLjAwMDAwMF0gcHNjaTogcHJvYmluZyBmb3IgY29uZHVpdCBtZXRob2QgZnJvbSBEVC4NClsg ICAgMC4wMDAwMDBdIHBzY2k6IFBTQ0l2MS4xIGRldGVjdGVkIGluIGZpcm13YXJlLg0KWyAgICAw LjAwMDAwMF0gcHNjaTogVXNpbmcgc3RhbmRhcmQgUFNDSSB2MC4yIGZ1bmN0aW9uIElEcw0KWyAg ICAwLjAwMDAwMF0gcHNjaTogVHJ1c3RlZCBPUyBtaWdyYXRpb24gbm90IHJlcXVpcmVkDQpbICAg IDAuMDAwMDAwXSBwc2NpOiBTTUMgQ2FsbGluZyBDb252ZW50aW9uIHYxLjENClsgICAgMC4wMDAw MDBdIHBlcmNwdTogRW1iZWRkZWQgMjIgcGFnZXMvY3B1IHM1Mjk1MiByODE5MiBkMjg5NjggdTkw MTEyDQpbICAgIDAuMDAwMDAwXSBwY3B1LWFsbG9jOiBzNTI5NTIgcjgxOTIgZDI4OTY4IHU5MDEx MiBhbGxvYz0yMio0MDk2DQpbICAgIDAuMDAwMDAwXSBwY3B1LWFsbG9jOiBbMF0gMCBbMF0gMSBb MF0gMiBbMF0gMyBbMF0gNCBbMF0gNSBbMF0gNiBbMF0gNyANClsgICAgMC4wMDAwMDBdIERldGVj dGVkIFBJUFQgSS1jYWNoZSBvbiBDUFUwDQpbICAgIDAuMDAwMDAwXSBDUFUgZmVhdHVyZXM6IGRl dGVjdGVkOiBFTDIgdmVjdG9yIGhhcmRlbmluZw0KWyAgICAwLjAwMDAwMF0gU3BlY3VsYXRpdmUg U3RvcmUgQnlwYXNzIERpc2FibGUgbWl0aWdhdGlvbiBub3QgcmVxdWlyZWQNClsgICAgMC4wMDAw MDBdIEJ1aWx0IDEgem9uZWxpc3RzLCBtb2JpbGl0eSBncm91cGluZyBvbi4gIFRvdGFsIHBhZ2Vz OiAyMDMyMTI4DQpbICAgIDAuMDAwMDAwXSBQb2xpY3kgem9uZTogTm9ybWFsDQpbICAgIDAuMDAw MDAwXSBLZXJuZWwgY29tbWFuZCBsaW5lOiBjb25zb2xlPXR0eVNDMCwxMTUyMDAgaWdub3JlX2xv Z2xldmVsIGNvbnNvbGVibGFuaz0wIHJ3IHJvb3Q9L2Rldi9uZnMgaXA9ZGhjcA0KWyAgICAwLjAw MDAwMF0gRGVudHJ5IGNhY2hlIGhhc2ggdGFibGUgZW50cmllczogMTA0ODU3NiAob3JkZXI6IDEx LCA4Mzg4NjA4IGJ5dGVzLCBsaW5lYXIpDQpbICAgIDAuMDAwMDAwXSBJbm9kZS1jYWNoZSBoYXNo IHRhYmxlIGVudHJpZXM6IDUyNDI4OCAob3JkZXI6IDEwLCA0MTk0MzA0IGJ5dGVzLCBsaW5lYXIp DQpbICAgIDAuMDAwMDAwXSBtZW0gYXV0by1pbml0OiBzdGFjazpvZmYsIGhlYXAgYWxsb2M6b2Zm LCBoZWFwIGZyZWU6b2ZmDQpbICAgIDAuMDAwMDAwXSBzb2Z0d2FyZSBJTyBUTEI6IG1hcHBlZCBb bWVtIDB4YmEwMDAwMDAtMHhiZTAwMDAwMF0gKDY0TUIpDQpbICAgIDAuMDAwMDAwXSBNZW1vcnk6 IDc5NzIzNjhLLzgyNTc1MzZLIGF2YWlsYWJsZSAoMTIwOTJLIGtlcm5lbCBjb2RlLCAxODQ2SyBy d2RhdGEsIDYzMjBLIHJvZGF0YSwgNDk5MksgaW5pdCwgNDUwSyBic3MsIDI1MjQwMEsgcmVzZXJ2 ZWQsIDMyNzY4SyBjbWEtcmVzZXJ2ZWQpDQpbICAgIDAuMDAwMDAwXSBTTFVCOiBIV2FsaWduPTY0 LCBPcmRlcj0wLTMsIE1pbk9iamVjdHM9MCwgQ1BVcz04LCBOb2Rlcz0xDQpbICAgIDAuMDAwMDAw XSByY3U6IFByZWVtcHRpYmxlIGhpZXJhcmNoaWNhbCBSQ1UgaW1wbGVtZW50YXRpb24uDQpbICAg IDAuMDAwMDAwXSByY3U6IAlSQ1UgcmVzdHJpY3RpbmcgQ1BVcyBmcm9tIE5SX0NQVVM9MjU2IHRv IG5yX2NwdV9pZHM9OC4NClsgICAgMC4wMDAwMDBdIAlUYXNrcyBSQ1UgZW5hYmxlZC4NClsgICAg MC4wMDAwMDBdIHJjdTogUkNVIGNhbGN1bGF0ZWQgdmFsdWUgb2Ygc2NoZWR1bGVyLWVubGlzdG1l bnQgZGVsYXkgaXMgMjUgamlmZmllcy4NClsgICAgMC4wMDAwMDBdIHJjdTogQWRqdXN0aW5nIGdl b21ldHJ5IGZvciByY3VfZmFub3V0X2xlYWY9MTYsIG5yX2NwdV9pZHM9OA0KWyAgICAwLjAwMDAw MF0gTlJfSVJRUzogNjQsIG5yX2lycXM6IDY0LCBwcmVhbGxvY2F0ZWQgaXJxczogMA0KWyAgICAw LjAwMDAwMF0gR0lDOiBBZGp1c3RpbmcgQ1BVIGludGVyZmFjZSBiYXNlIHRvIDB4MDAwMDAwMDBm MTAyZjAwMA0KWyAgICAwLjAwMDAwMF0gR0lDOiBVc2luZyBzcGxpdCBFT0kvRGVhY3RpdmF0ZSBt b2RlDQpbICAgIDAuMDAwMDAwXSByYW5kb206IGdldF9yYW5kb21fYnl0ZXMgY2FsbGVkIGZyb20g c3RhcnRfa2VybmVsKzB4MmYwLzB4NDkwIHdpdGggY3JuZ19pbml0PTANClsgICAgMC4wMDAwMDBd IGFyY2hfdGltZXI6IGNwMTUgdGltZXIocykgcnVubmluZyBhdCA4LjMzTUh6IChwaHlzKS4NClsg ICAgMC4wMDAwMDBdIGNsb2Nrc291cmNlOiBhcmNoX3N5c19jb3VudGVyOiBtYXNrOiAweGZmZmZm ZmZmZmZmZmZmIG1heF9jeWNsZXM6IDB4MWVjMDI5MjNlLCBtYXhfaWRsZV9uczogNDQwNzk1MjAy MTI1IG5zDQpbICAgIDAuMDAwMDAzXSBzY2hlZF9jbG9jazogNTYgYml0cyBhdCA4TUh6LCByZXNv bHV0aW9uIDEyMG5zLCB3cmFwcyBldmVyeSAyMTk5MDIzMjU1NDk2bnMNClsgICAgMC4wMDAxNDJd IENvbnNvbGU6IGNvbG91ciBkdW1teSBkZXZpY2UgODB4MjUNClsgICAgMC4wMDAyMTFdIENhbGli cmF0aW5nIGRlbGF5IGxvb3AgKHNraXBwZWQpLCB2YWx1ZSBjYWxjdWxhdGVkIHVzaW5nIHRpbWVy IGZyZXF1ZW5jeS4uIDE2LjY2IEJvZ29NSVBTIChscGo9MzMzMzMpDQpbICAgIDAuMDAwMjE4XSBw aWRfbWF4OiBkZWZhdWx0OiAzMjc2OCBtaW5pbXVtOiAzMDENClsgICAgMC4wMDAyNzNdIExTTTog U2VjdXJpdHkgRnJhbWV3b3JrIGluaXRpYWxpemluZw0KWyAgICAwLjAwMDM1MV0gTW91bnQtY2Fj aGUgaGFzaCB0YWJsZSBlbnRyaWVzOiAxNjM4NCAob3JkZXI6IDUsIDEzMTA3MiBieXRlcywgbGlu ZWFyKQ0KWyAgICAwLjAwMDM5N10gTW91bnRwb2ludC1jYWNoZSBoYXNoIHRhYmxlIGVudHJpZXM6 IDE2Mzg0IChvcmRlcjogNSwgMTMxMDcyIGJ5dGVzLCBsaW5lYXIpDQpbICAgIDAuMDIzOTc0XSBB U0lEIGFsbG9jYXRvciBpbml0aWFsaXNlZCB3aXRoIDMyNzY4IGVudHJpZXMNClsgICAgMC4wMzE5 NjNdIHJjdTogSGllcmFyY2hpY2FsIFNSQ1UgaW1wbGVtZW50YXRpb24uDQpbICAgIDAuMDQxMDMx XSBEZXRlY3RlZCBSZW5lc2FzIFItQ2FyIEdlbjMgcjhhNzc5NSBFUzMuMA0KWyAgICAwLjA0MjM1 NF0gRUZJIHNlcnZpY2VzIHdpbGwgbm90IGJlIGF2YWlsYWJsZS4NClsgICAgMC4wNDc5ODldIHNt cDogQnJpbmdpbmcgdXAgc2Vjb25kYXJ5IENQVXMgLi4uDQpbICAgIDAuMDgwMTczXSBEZXRlY3Rl ZCBQSVBUIEktY2FjaGUgb24gQ1BVMQ0KWyAgICAwLjA4MDIxM10gQ1BVMTogQm9vdGVkIHNlY29u ZGFyeSBwcm9jZXNzb3IgMHgwMDAwMDAwMDAxIFsweDQxMWZkMDczXQ0KWyAgICAwLjExMjE5MF0g RGV0ZWN0ZWQgUElQVCBJLWNhY2hlIG9uIENQVTINClsgICAgMC4xMTIyMTBdIENQVTI6IEJvb3Rl ZCBzZWNvbmRhcnkgcHJvY2Vzc29yIDB4MDAwMDAwMDAwMiBbMHg0MTFmZDA3M10NClsgICAgMC4x NDQyMjVdIERldGVjdGVkIFBJUFQgSS1jYWNoZSBvbiBDUFUzDQpbICAgIDAuMTQ0MjQ0XSBDUFUz OiBCb290ZWQgc2Vjb25kYXJ5IHByb2Nlc3NvciAweDAwMDAwMDAwMDMgWzB4NDExZmQwNzNdDQpb ICAgIDAuMTc2MjY3XSBDUFUgZmVhdHVyZXM6IGRldGVjdGVkOiBBUk0gZXJyYXR1bSA4NDU3MTkN ClsgICAgMC4xNzYyNzhdIERldGVjdGVkIFZJUFQgSS1jYWNoZSBvbiBDUFU0DQpbICAgIDAuMTc2 MzE2XSBDUFU0OiBCb290ZWQgc2Vjb25kYXJ5IHByb2Nlc3NvciAweDAwMDAwMDAxMDAgWzB4NDEw ZmQwMzRdDQpbICAgIDAuMjA4MjkyXSBEZXRlY3RlZCBWSVBUIEktY2FjaGUgb24gQ1BVNQ0KWyAg ICAwLjIwODMxNl0gQ1BVNTogQm9vdGVkIHNlY29uZGFyeSBwcm9jZXNzb3IgMHgwMDAwMDAwMTAx IFsweDQxMGZkMDM0XQ0KWyAgICAwLjI0MDMzMV0gRGV0ZWN0ZWQgVklQVCBJLWNhY2hlIG9uIENQ VTYNClsgICAgMC4yNDAzNTRdIENQVTY6IEJvb3RlZCBzZWNvbmRhcnkgcHJvY2Vzc29yIDB4MDAw MDAwMDEwMiBbMHg0MTBmZDAzNF0NClsgICAgMC4yNzIzNjVdIERldGVjdGVkIFZJUFQgSS1jYWNo ZSBvbiBDUFU3DQpbICAgIDAuMjcyMzg5XSBDUFU3OiBCb290ZWQgc2Vjb25kYXJ5IHByb2Nlc3Nv ciAweDAwMDAwMDAxMDMgWzB4NDEwZmQwMzRdDQpbICAgIDAuMjcyNDY0XSBzbXA6IEJyb3VnaHQg dXAgMSBub2RlLCA4IENQVXMNClsgICAgMC4yNzI0ODRdIFNNUDogVG90YWwgb2YgOCBwcm9jZXNz b3JzIGFjdGl2YXRlZC4NClsgICAgMC4yNzI0ODhdIENQVSBmZWF0dXJlczogZGV0ZWN0ZWQ6IDMy LWJpdCBFTDAgU3VwcG9ydA0KWyAgICAwLjI3MjQ5M10gQ1BVIGZlYXR1cmVzOiBkZXRlY3RlZDog Q1JDMzIgaW5zdHJ1Y3Rpb25zDQpbICAgIDAuMjgyNjEyXSBDUFU6IEFsbCBDUFUocykgc3RhcnRl ZCBhdCBFTDINClsgICAgMC4yODI2NDRdIGFsdGVybmF0aXZlczogcGF0Y2hpbmcga2VybmVsIGNv ZGUNClsgICAgMC4yODM2NzZdIGRldnRtcGZzOiBpbml0aWFsaXplZA0KWyAgICAwLjI4OTQ1OF0g Y2xvY2tzb3VyY2U6IGppZmZpZXM6IG1hc2s6IDB4ZmZmZmZmZmYgbWF4X2N5Y2xlczogMHhmZmZm ZmZmZiwgbWF4X2lkbGVfbnM6IDc2NDUwNDE3ODUxMDAwMDAgbnMNClsgICAgMC4yODk0NzFdIGZ1 dGV4IGhhc2ggdGFibGUgZW50cmllczogMjA0OCAob3JkZXI6IDUsIDEzMTA3MiBieXRlcywgbGlu ZWFyKQ0KWyAgICAwLjI5MDE2M10gcGluY3RybCBjb3JlOiBpbml0aWFsaXplZCBwaW5jdHJsIHN1 YnN5c3RlbQ0KWyAgICAwLjI5MTM2MF0gRE1JIG5vdCBwcmVzZW50IG9yIGludmFsaWQuDQpbICAg IDAuMjkxNjA3XSBORVQ6IFJlZ2lzdGVyZWQgcHJvdG9jb2wgZmFtaWx5IDE2DQpbICAgIDAuMjky Mzg4XSBETUE6IHByZWFsbG9jYXRlZCAyNTYgS2lCIHBvb2wgZm9yIGF0b21pYyBhbGxvY2F0aW9u cw0KWyAgICAwLjI5MjM5OV0gYXVkaXQ6IGluaXRpYWxpemluZyBuZXRsaW5rIHN1YnN5cyAoZGlz YWJsZWQpDQpbICAgIDAuMjkyNTM5XSBhdWRpdDogdHlwZT0yMDAwIGF1ZGl0KDAuMjkyOjEpOiBz dGF0ZT1pbml0aWFsaXplZCBhdWRpdF9lbmFibGVkPTAgcmVzPTENClsgICAgMC4yOTM1NzNdIGNw dWlkbGU6IHVzaW5nIGdvdmVybm9yIG1lbnUNClsgICAgMC4yOTM3MzNdIGh3LWJyZWFrcG9pbnQ6 IGZvdW5kIDYgYnJlYWtwb2ludCBhbmQgNCB3YXRjaHBvaW50IHJlZ2lzdGVycy4NClsgICAgMC4y OTQ2NzhdIFNlcmlhbDogQU1CQSBQTDAxMSBVQVJUIGRyaXZlcg0KWyAgICAwLjI5NjkxMl0gc2gt cGZjIGU2MDYwMDAwLnBpbi1jb250cm9sbGVyOiBJUlEgaW5kZXggMCBub3QgZm91bmQNClsgICAg MC4yOTcxMjVdIHNoLXBmYyBlNjA2MDAwMC5waW4tY29udHJvbGxlcjogcjhhNzc5NTFfcGZjIHN1 cHBvcnQgcmVnaXN0ZXJlZA0KWyAgICAwLjMxNzQzMl0gSHVnZVRMQiByZWdpc3RlcmVkIDEuMDAg R2lCIHBhZ2Ugc2l6ZSwgcHJlLWFsbG9jYXRlZCAwIHBhZ2VzDQpbICAgIDAuMzE3NDM5XSBIdWdl VExCIHJlZ2lzdGVyZWQgMzIuMCBNaUIgcGFnZSBzaXplLCBwcmUtYWxsb2NhdGVkIDAgcGFnZXMN ClsgICAgMC4zMTc0NDNdIEh1Z2VUTEIgcmVnaXN0ZXJlZCAyLjAwIE1pQiBwYWdlIHNpemUsIHBy ZS1hbGxvY2F0ZWQgMCBwYWdlcw0KWyAgICAwLjMxNzQ0N10gSHVnZVRMQiByZWdpc3RlcmVkIDY0 LjAgS2lCIHBhZ2Ugc2l6ZSwgcHJlLWFsbG9jYXRlZCAwIHBhZ2VzDQpbICAgIDAuMzE5MTk5XSBj cnlwdGQ6IG1heF9jcHVfcWxlbiBzZXQgdG8gMTAwMA0KWyAgICAwLjMyMjA5MV0gQUNQSTogSW50 ZXJwcmV0ZXIgZGlzYWJsZWQuDQpbICAgIDAuMzI1NjI3XSBpb21tdTogRGVmYXVsdCBkb21haW4g dHlwZTogVHJhbnNsYXRlZCANClsgICAgMC4zMjU4MjNdIHZnYWFyYjogbG9hZGVkDQpbICAgIDAu MzI2MDExXSBTQ1NJIHN1YnN5c3RlbSBpbml0aWFsaXplZA0KWyAgICAwLjMyNjExM10gbGliYXRh IHZlcnNpb24gMy4wMCBsb2FkZWQuDQpbICAgIDAuMzI2MjQzXSB1c2Jjb3JlOiByZWdpc3RlcmVk IG5ldyBpbnRlcmZhY2UgZHJpdmVyIHVzYmZzDQpbICAgIDAuMzI2MjY0XSB1c2Jjb3JlOiByZWdp c3RlcmVkIG5ldyBpbnRlcmZhY2UgZHJpdmVyIGh1Yg0KWyAgICAwLjMyNjMwN10gdXNiY29yZTog cmVnaXN0ZXJlZCBuZXcgZGV2aWNlIGRyaXZlciB1c2INClsgICAgMC4zMjcyNTVdIGkyYy1zaF9t b2JpbGUgZTYwYjAwMDAuaTJjOiBJMkMgYWRhcHRlciA3LCBidXMgc3BlZWQgNDAwMDAwIEh6DQpb ICAgIDAuMzI3NTYwXSBwcHNfY29yZTogTGludXhQUFMgQVBJIHZlci4gMSByZWdpc3RlcmVkDQpb ICAgIDAuMzI3NTY0XSBwcHNfY29yZTogU29mdHdhcmUgdmVyLiA1LjMuNiAtIENvcHlyaWdodCAy MDA1LTIwMDcgUm9kb2xmbyBHaW9tZXR0aSA8Z2lvbWV0dGlAbGludXguaXQ+DQpbICAgIDAuMzI3 NTczXSBQVFAgY2xvY2sgc3VwcG9ydCByZWdpc3RlcmVkDQpbICAgIDAuMzI3NzAxXSBFREFDIE1D OiBWZXI6IDMuMC4wDQpbICAgIDAuMzI4OTkxXSBGUEdBIG1hbmFnZXIgZnJhbWV3b3JrDQpbICAg IDAuMzI5MDMxXSBBZHZhbmNlZCBMaW51eCBTb3VuZCBBcmNoaXRlY3R1cmUgRHJpdmVyIEluaXRp YWxpemVkLg0KWyAgICAwLjMyOTQ5N10gY2xvY2tzb3VyY2U6IFN3aXRjaGVkIHRvIGNsb2Nrc291 cmNlIGFyY2hfc3lzX2NvdW50ZXINClsgICAgMC4zMjk2MzldIFZGUzogRGlzayBxdW90YXMgZHF1 b3RfNi42LjANClsgICAgMC4zMjk2ODJdIFZGUzogRHF1b3QtY2FjaGUgaGFzaCB0YWJsZSBlbnRy aWVzOiA1MTIgKG9yZGVyIDAsIDQwOTYgYnl0ZXMpDQpbICAgIDAuMzI5ODAwXSBwbnA6IFBuUCBB Q1BJOiBkaXNhYmxlZA0KWyAgICAwLjMzMjc2NF0gdGhlcm1hbF9zeXM6IFJlZ2lzdGVyZWQgdGhl cm1hbCBnb3Zlcm5vciAnc3RlcF93aXNlJw0KWyAgICAwLjMzMjc2N10gdGhlcm1hbF9zeXM6IFJl Z2lzdGVyZWQgdGhlcm1hbCBnb3Zlcm5vciAncG93ZXJfYWxsb2NhdG9yJw0KWyAgICAwLjMzMzI3 MF0gTkVUOiBSZWdpc3RlcmVkIHByb3RvY29sIGZhbWlseSAyDQpbICAgIDAuMzMzNTU4XSB0Y3Bf bGlzdGVuX3BvcnRhZGRyX2hhc2ggaGFzaCB0YWJsZSBlbnRyaWVzOiA0MDk2IChvcmRlcjogNCwg NjU1MzYgYnl0ZXMsIGxpbmVhcikNClsgICAgMC4zMzM2MjRdIFRDUCBlc3RhYmxpc2hlZCBoYXNo IHRhYmxlIGVudHJpZXM6IDY1NTM2IChvcmRlcjogNywgNTI0Mjg4IGJ5dGVzLCBsaW5lYXIpDQpb ICAgIDAuMzMzOTAzXSBUQ1AgYmluZCBoYXNoIHRhYmxlIGVudHJpZXM6IDY1NTM2IChvcmRlcjog OCwgMTA0ODU3NiBieXRlcywgbGluZWFyKQ0KWyAgICAwLjMzNDQ4OV0gVENQOiBIYXNoIHRhYmxl cyBjb25maWd1cmVkIChlc3RhYmxpc2hlZCA2NTUzNiBiaW5kIDY1NTM2KQ0KWyAgICAwLjMzNDYw Nl0gVURQIGhhc2ggdGFibGUgZW50cmllczogNDA5NiAob3JkZXI6IDUsIDEzMTA3MiBieXRlcywg bGluZWFyKQ0KWyAgICAwLjMzNDcxNF0gVURQLUxpdGUgaGFzaCB0YWJsZSBlbnRyaWVzOiA0MDk2 IChvcmRlcjogNSwgMTMxMDcyIGJ5dGVzLCBsaW5lYXIpDQpbICAgIDAuMzM0OTI5XSBORVQ6IFJl Z2lzdGVyZWQgcHJvdG9jb2wgZmFtaWx5IDENClsgICAgMC4zMzUyOTBdIFJQQzogUmVnaXN0ZXJl ZCBuYW1lZCBVTklYIHNvY2tldCB0cmFuc3BvcnQgbW9kdWxlLg0KWyAgICAwLjMzNTI5NV0gUlBD OiBSZWdpc3RlcmVkIHVkcCB0cmFuc3BvcnQgbW9kdWxlLg0KWyAgICAwLjMzNTI5OV0gUlBDOiBS ZWdpc3RlcmVkIHRjcCB0cmFuc3BvcnQgbW9kdWxlLg0KWyAgICAwLjMzNTMwMl0gUlBDOiBSZWdp c3RlcmVkIHRjcCBORlN2NC4xIGJhY2tjaGFubmVsIHRyYW5zcG9ydCBtb2R1bGUuDQpbICAgIDAu MzM1MzExXSBQQ0k6IENMUyAwIGJ5dGVzLCBkZWZhdWx0IDY0DQpbICAgIDAuMzM2MTQxXSBodyBw ZXJmZXZlbnRzOiBlbmFibGVkIHdpdGggYXJtdjhfY29ydGV4X2E1MyBQTVUgZHJpdmVyLCA3IGNv dW50ZXJzIGF2YWlsYWJsZQ0KWyAgICAwLjMzNjM3N10gaHcgcGVyZmV2ZW50czogZW5hYmxlZCB3 aXRoIGFybXY4X2NvcnRleF9hNTcgUE1VIGRyaXZlciwgNyBjb3VudGVycyBhdmFpbGFibGUNClsg ICAgMC4zMzY3OTldIGt2bSBbMV06IElQQSBTaXplIExpbWl0OiA0MGJpdHMNClsgICAgMC4zMzcy NzNdIGt2bSBbMV06IHZnaWMgaW50ZXJydXB0IElSUTENClsgICAgMC4zMzc0MTVdIGt2bSBbMV06 IEh5cCBtb2RlIGluaXRpYWxpemVkIHN1Y2Nlc3NmdWxseQ0KWyAgICAwLjM0MTc3NV0gSW5pdGlh bGlzZSBzeXN0ZW0gdHJ1c3RlZCBrZXlyaW5ncw0KWyAgICAwLjM0MTg2NF0gd29ya2luZ3NldDog dGltZXN0YW1wX2JpdHM9NDQgbWF4X29yZGVyPTIxIGJ1Y2tldF9vcmRlcj0wDQpbICAgIDAuMzQ1 MDc2XSBzcXVhc2hmczogdmVyc2lvbiA0LjAgKDIwMDkvMDEvMzEpIFBoaWxsaXAgTG91Z2hlcg0K WyAgICAwLjM0NTUxNV0gTkZTOiBSZWdpc3RlcmluZyB0aGUgaWRfcmVzb2x2ZXIga2V5IHR5cGUN ClsgICAgMC4zNDU1MzJdIEtleSB0eXBlIGlkX3Jlc29sdmVyIHJlZ2lzdGVyZWQNClsgICAgMC4z NDU1MzVdIEtleSB0eXBlIGlkX2xlZ2FjeSByZWdpc3RlcmVkDQpbICAgIDAuMzQ1NTQ0XSBuZnM0 ZmlsZWxheW91dF9pbml0OiBORlN2NCBGaWxlIExheW91dCBEcml2ZXIgUmVnaXN0ZXJpbmcuLi4N ClsgICAgMC4zNDU2MzhdIDlwOiBJbnN0YWxsaW5nIHY5ZnMgOXAyMDAwIGZpbGUgc3lzdGVtIHN1 cHBvcnQNClsgICAgMC4zNTQ5OTVdIEtleSB0eXBlIGFzeW1tZXRyaWMgcmVnaXN0ZXJlZA0KWyAg ICAwLjM1NTAwMV0gQXN5bW1ldHJpYyBrZXkgcGFyc2VyICd4NTA5JyByZWdpc3RlcmVkDQpbICAg IDAuMzU1MDI3XSBCbG9jayBsYXllciBTQ1NJIGdlbmVyaWMgKGJzZykgZHJpdmVyIHZlcnNpb24g MC40IGxvYWRlZCAobWFqb3IgMjQ1KQ0KWyAgICAwLjM1NTAzMl0gaW8gc2NoZWR1bGVyIG1xLWRl YWRsaW5lIHJlZ2lzdGVyZWQNClsgICAgMC4zNTUwMzZdIGlvIHNjaGVkdWxlciBreWJlciByZWdp c3RlcmVkDQpbICAgIDAuMzU5NjM5XSBwaHlfcmNhcl9nZW4zX3VzYjIgZWUwYTAyMDAudXNiLXBo eTogSVJRIGluZGV4IDAgbm90IGZvdW5kDQpbICAgIDAuMzYwMzQ2XSBwaHlfcmNhcl9nZW4zX3Vz YjIgZWUwYzAyMDAudXNiLXBoeTogSVJRIGluZGV4IDAgbm90IGZvdW5kDQpbICAgIDAuMzY2MDEw XSBncGlvX3JjYXIgZTYwNTAwMDAuZ3BpbzogZHJpdmluZyAxNiBHUElPcw0KWyAgICAwLjM2NjE4 N10gZ3Bpb19yY2FyIGU2MDUxMDAwLmdwaW86IGRyaXZpbmcgMjkgR1BJT3MNClsgICAgMC4zNjYz NDhdIGdwaW9fcmNhciBlNjA1MjAwMC5ncGlvOiBkcml2aW5nIDE1IEdQSU9zDQpbICAgIDAuMzY2 NTA0XSBncGlvX3JjYXIgZTYwNTMwMDAuZ3BpbzogZHJpdmluZyAxNiBHUElPcw0KWyAgICAwLjM2 NjY2M10gZ3Bpb19yY2FyIGU2MDU0MDAwLmdwaW86IGRyaXZpbmcgMTggR1BJT3MNClsgICAgMC4z NjY4MTZdIGdwaW9fcmNhciBlNjA1NTAwMC5ncGlvOiBkcml2aW5nIDI2IEdQSU9zDQpbICAgIDAu MzY2OTczXSBncGlvX3JjYXIgZTYwNTU0MDAuZ3BpbzogZHJpdmluZyAzMiBHUElPcw0KWyAgICAw LjM2NzEyNl0gZ3Bpb19yY2FyIGU2MDU1ODAwLmdwaW86IGRyaXZpbmcgNCBHUElPcw0KWyAgICAw LjM2ODU3MV0gcmNhci1wY2llIGZlMDAwMDAwLnBjaWU6IGhvc3QgYnJpZGdlIC9zb2MvcGNpZUBm ZTAwMDAwMCByYW5nZXM6DQpbICAgIDAuMzY4NTk2XSByY2FyLXBjaWUgZmUwMDAwMDAucGNpZTog ICAgSU8gMHhmZTEwMDAwMC4uMHhmZTFmZmZmZiAtPiAweDAwMDAwMDAwDQpbICAgIDAuMzY4NjEz XSByY2FyLXBjaWUgZmUwMDAwMDAucGNpZTogICBNRU0gMHhmZTIwMDAwMC4uMHhmZTNmZmZmZiAt PiAweGZlMjAwMDAwDQpbICAgIDAuMzY4NjI2XSByY2FyLXBjaWUgZmUwMDAwMDAucGNpZTogICBN RU0gMHgzMDAwMDAwMC4uMHgzN2ZmZmZmZiAtPiAweDMwMDAwMDAwDQpbICAgIDAuMzY4NjM1XSBy Y2FyLXBjaWUgZmUwMDAwMDAucGNpZTogICBNRU0gMHgzODAwMDAwMC4uMHgzZmZmZmZmZiAtPiAw eDM4MDAwMDAwDQpbICAgIDAuNDMzMDAzXSByY2FyLXBjaWUgZmUwMDAwMDAucGNpZTogUENJZSBs aW5rIGRvd24NClsgICAgMC40MzMxNDhdIHJjYXItcGNpZSBlZTgwMDAwMC5wY2llOiBob3N0IGJy aWRnZSAvc29jL3BjaWVAZWU4MDAwMDAgcmFuZ2VzOg0KWyAgICAwLjQzMzE2NV0gcmNhci1wY2ll IGVlODAwMDAwLnBjaWU6ICAgIElPIDB4ZWU5MDAwMDAuLjB4ZWU5ZmZmZmYgLT4gMHgwMDAwMDAw MA0KWyAgICAwLjQzMzE3OV0gcmNhci1wY2llIGVlODAwMDAwLnBjaWU6ICAgTUVNIDB4ZWVhMDAw MDAuLjB4ZWViZmZmZmYgLT4gMHhlZWEwMDAwMA0KWyAgICAwLjQzMzE5MV0gcmNhci1wY2llIGVl ODAwMDAwLnBjaWU6ICAgTUVNIDB4YzAwMDAwMDAuLjB4YzdmZmZmZmYgLT4gMHhjMDAwMDAwMA0K WyAgICAwLjQzMzIwMF0gcmNhci1wY2llIGVlODAwMDAwLnBjaWU6ICAgTUVNIDB4YzgwMDAwMDAu LjB4Y2ZmZmZmZmYgLT4gMHhjODAwMDAwMA0KWyAgICAwLjQ5Njk4NV0gcmNhci1wY2llIGVlODAw MDAwLnBjaWU6IFBDSWUgbGluayBkb3duDQpbICAgIDAuNDk4ODkzXSBFSU5KOiBBQ1BJIGRpc2Fi bGVkLg0KWyAgICAwLjUxMDQzMF0gU2VyaWFsOiA4MjUwLzE2NTUwIGRyaXZlciwgNCBwb3J0cywg SVJRIHNoYXJpbmcgZW5hYmxlZA0KWyAgICAwLjUxMjI0Nl0gU3VwZXJIIChIKVNDSShGKSBkcml2 ZXIgaW5pdGlhbGl6ZWQNClsgICAgMC41MTI1NjhdIHNoLXNjaSBlNjU1MDAwMC5zZXJpYWw6IElS USBpbmRleCAxIG5vdCBmb3VuZA0KWyAgICAwLjUxMjU3N10gc2gtc2NpIGU2NTUwMDAwLnNlcmlh bDogSVJRIGluZGV4IDIgbm90IGZvdW5kDQpbICAgIDAuNTEyNTg0XSBzaC1zY2kgZTY1NTAwMDAu c2VyaWFsOiBJUlEgaW5kZXggMyBub3QgZm91bmQNClsgICAgMC41MTI1OTFdIHNoLXNjaSBlNjU1 MDAwMC5zZXJpYWw6IElSUSBpbmRleCA0IG5vdCBmb3VuZA0KWyAgICAwLjUxMjU5N10gc2gtc2Np IGU2NTUwMDAwLnNlcmlhbDogSVJRIGluZGV4IDUgbm90IGZvdW5kDQpbICAgIDAuNTEyNjQ3XSBl NjU1MDAwMC5zZXJpYWw6IHR0eVNDMSBhdCBNTUlPIDB4ZTY1NTAwMDAgKGlycSA9IDM0LCBiYXNl X2JhdWQgPSAwKSBpcyBhIGhzY2lmDQpbICAgIDAuNTEzMDY1XSBzaC1zY2kgZTZlODgwMDAuc2Vy aWFsOiBJUlEgaW5kZXggMSBub3QgZm91bmQNClsgICAgMC41MTMwNzNdIHNoLXNjaSBlNmU4ODAw MC5zZXJpYWw6IElSUSBpbmRleCAyIG5vdCBmb3VuZA0KWyAgICAwLjUxMzA3OV0gc2gtc2NpIGU2 ZTg4MDAwLnNlcmlhbDogSVJRIGluZGV4IDMgbm90IGZvdW5kDQpbICAgIDAuNTEzMDg2XSBzaC1z Y2kgZTZlODgwMDAuc2VyaWFsOiBJUlEgaW5kZXggNCBub3QgZm91bmQNClsgICAgMC41MTMwOTJd IHNoLXNjaSBlNmU4ODAwMC5zZXJpYWw6IElSUSBpbmRleCA1IG5vdCBmb3VuZA0KWyAgICAwLjUx MzExOV0gZTZlODgwMDAuc2VyaWFsOiB0dHlTQzAgYXQgTU1JTyAweGU2ZTg4MDAwIChpcnEgPSAx MTksIGJhc2VfYmF1ZCA9IDApIGlzIGEgc2NpZg0KWyAgICAxLjY1NTY5NV0gcHJpbnRrOiBjb25z b2xlIFt0dHlTQzBdIGVuYWJsZWQNClsgICAgMS42NjA3MDZdIG1zbV9zZXJpYWw6IGRyaXZlciBp bml0aWFsaXplZA0KWyAgICAxLjY3MTU0NF0gbG9vcDogbW9kdWxlIGxvYWRlZA0KWyAgICAxLjY3 OTQ4Ml0gbGlicGh5OiBGaXhlZCBNRElPIEJ1czogcHJvYmVkDQpbICAgIDEuNjgzNzE5XSB0dW46 IFVuaXZlcnNhbCBUVU4vVEFQIGRldmljZSBkcml2ZXIsIDEuNg0KWyAgICAxLjY4OTU1OV0gdGh1 bmRlcl94Y3YsIHZlciAxLjANClsgICAgMS42OTI4MDVdIHRodW5kZXJfYmd4LCB2ZXIgMS4wDQpb ICAgIDEuNjk2MDUyXSBuaWNwZiwgdmVyIDEuMA0KWyAgICAxLjY5OTM3M10gaGNsZ2UgaXMgaW5p dGlhbGl6aW5nDQpbICAgIDEuNzAyNjg4XSBobnMzOiBIaXNpbGljb24gRXRoZXJuZXQgTmV0d29y ayBEcml2ZXIgZm9yIEhpcDA4IEZhbWlseSAtIHZlcnNpb24NClsgICAgMS43MDk5MDddIGhuczM6 IENvcHlyaWdodCAoYykgMjAxNyBIdWF3ZWkgQ29ycG9yYXRpb24uDQpbICAgIDEuNzE1MjQyXSBl MTAwMGU6IEludGVsKFIpIFBSTy8xMDAwIE5ldHdvcmsgRHJpdmVyIC0gMy4yLjYtaw0KWyAgICAx LjcyMTA3M10gZTEwMDBlOiBDb3B5cmlnaHQoYykgMTk5OSAtIDIwMTUgSW50ZWwgQ29ycG9yYXRp b24uDQpbICAgIDEuNzI3MDEyXSBpZ2I6IEludGVsKFIpIEdpZ2FiaXQgRXRoZXJuZXQgTmV0d29y ayBEcml2ZXIgLSB2ZXJzaW9uIDUuNi4wLWsNClsgICAgMS43MzM5NzFdIGlnYjogQ29weXJpZ2h0 IChjKSAyMDA3LTIwMTQgSW50ZWwgQ29ycG9yYXRpb24uDQpbICAgIDEuNzM5NTU3XSBpZ2J2Zjog SW50ZWwoUikgR2lnYWJpdCBWaXJ0dWFsIEZ1bmN0aW9uIE5ldHdvcmsgRHJpdmVyIC0gdmVyc2lv biAyLjQuMC1rDQpbICAgIDEuNzQ3MzgzXSBpZ2J2ZjogQ29weXJpZ2h0IChjKSAyMDA5IC0gMjAx MiBJbnRlbCBDb3Jwb3JhdGlvbi4NClsgICAgMS43NTM2MzZdIHNreTI6IGRyaXZlciB2ZXJzaW9u IDEuMzANClsgICAgMS43NTgyNjRdIFZGSU8gLSBVc2VyIExldmVsIG1ldGEtZHJpdmVyIHZlcnNp b246IDAuMw0KWyAgICAxLjc2NDc4M10gZWhjaV9oY2Q6IFVTQiAyLjAgJ0VuaGFuY2VkJyBIb3N0 IENvbnRyb2xsZXIgKEVIQ0kpIERyaXZlcg0KWyAgICAxLjc3MTMyMF0gZWhjaS1wY2k6IEVIQ0kg UENJIHBsYXRmb3JtIGRyaXZlcg0KWyAgICAxLjc3NTc4MF0gZWhjaS1wbGF0Zm9ybTogRUhDSSBn ZW5lcmljIHBsYXRmb3JtIGRyaXZlcg0KWyAgICAxLjc4MTMzNV0gZWhjaS1wbGF0Zm9ybSBlZTBh MDEwMC51c2I6IEVIQ0kgSG9zdCBDb250cm9sbGVyDQpbICAgIDEuNzg3MDE2XSBlaGNpLXBsYXRm b3JtIGVlMGEwMTAwLnVzYjogbmV3IFVTQiBidXMgcmVnaXN0ZXJlZCwgYXNzaWduZWQgYnVzIG51 bWJlciAxDQpbICAgIDEuNzk0OTM1XSBlaGNpLXBsYXRmb3JtIGVlMGEwMTAwLnVzYjogaXJxIDE2 NSwgaW8gbWVtIDB4ZWUwYTAxMDANClsgICAgMS44MTM1MDddIGVoY2ktcGxhdGZvcm0gZWUwYTAx MDAudXNiOiBVU0IgMi4wIHN0YXJ0ZWQsIEVIQ0kgMS4xMA0KWyAgICAxLjgyMDA0NF0gaHViIDEt MDoxLjA6IFVTQiBodWIgZm91bmQNClsgICAgMS44MjM4MjhdIGh1YiAxLTA6MS4wOiAxIHBvcnQg ZGV0ZWN0ZWQNClsgICAgMS44MjgwMTddIGVoY2ktcGxhdGZvcm0gZWUwYzAxMDAudXNiOiBFSENJ IEhvc3QgQ29udHJvbGxlcg0KWyAgICAxLjgzMzY4NF0gZWhjaS1wbGF0Zm9ybSBlZTBjMDEwMC51 c2I6IG5ldyBVU0IgYnVzIHJlZ2lzdGVyZWQsIGFzc2lnbmVkIGJ1cyBudW1iZXIgMg0KWyAgICAx Ljg0MTU2MF0gZWhjaS1wbGF0Zm9ybSBlZTBjMDEwMC51c2I6IGlycSAxNjYsIGlvIG1lbSAweGVl MGMwMTAwDQpbICAgIDEuODYxNTA2XSBlaGNpLXBsYXRmb3JtIGVlMGMwMTAwLnVzYjogVVNCIDIu MCBzdGFydGVkLCBFSENJIDEuMTANClsgICAgMS44Njc5NDBdIGh1YiAyLTA6MS4wOiBVU0IgaHVi IGZvdW5kDQpbICAgIDEuODcxNzA0XSBodWIgMi0wOjEuMDogMSBwb3J0IGRldGVjdGVkDQpbICAg IDEuODc1ODYwXSBlaGNpLW9yaW9uOiBFSENJIG9yaW9uIGRyaXZlcg0KWyAgICAxLjg4MDA0OV0g ZWhjaS1leHlub3M6IEVIQ0kgRVhZTk9TIGRyaXZlcg0KWyAgICAxLjg4NDMyMF0gb2hjaV9oY2Q6 IFVTQiAxLjEgJ09wZW4nIEhvc3QgQ29udHJvbGxlciAoT0hDSSkgRHJpdmVyDQpbICAgIDEuODkw NTA5XSBvaGNpLXBjaTogT0hDSSBQQ0kgcGxhdGZvcm0gZHJpdmVyDQpbICAgIDEuODk0OTc4XSBv aGNpLXBsYXRmb3JtOiBPSENJIGdlbmVyaWMgcGxhdGZvcm0gZHJpdmVyDQpbICAgIDEuOTAwNDQ0 XSBvaGNpLXBsYXRmb3JtIGVlMGEwMDAwLnVzYjogR2VuZXJpYyBQbGF0Zm9ybSBPSENJIGNvbnRy b2xsZXINClsgICAgMS45MDcxNTldIG9oY2ktcGxhdGZvcm0gZWUwYTAwMDAudXNiOiBuZXcgVVNC IGJ1cyByZWdpc3RlcmVkLCBhc3NpZ25lZCBidXMgbnVtYmVyIDMNClsgICAgMS45MTUwMjVdIG9o Y2ktcGxhdGZvcm0gZWUwYTAwMDAudXNiOiBpcnEgMTY1LCBpbyBtZW0gMHhlZTBhMDAwMA0KWyAg ICAyLjAwODQ3N10gaHViIDMtMDoxLjA6IFVTQiBodWIgZm91bmQNClsgICAgMi4wMTIyNDRdIGh1 YiAzLTA6MS4wOiAxIHBvcnQgZGV0ZWN0ZWQNClsgICAgMi4wMTYzODhdIG9oY2ktcGxhdGZvcm0g ZWUwYzAwMDAudXNiOiBHZW5lcmljIFBsYXRmb3JtIE9IQ0kgY29udHJvbGxlcg0KWyAgICAyLjAy MzA5N10gb2hjaS1wbGF0Zm9ybSBlZTBjMDAwMC51c2I6IG5ldyBVU0IgYnVzIHJlZ2lzdGVyZWQs IGFzc2lnbmVkIGJ1cyBudW1iZXIgNA0KWyAgICAyLjAzMDk3N10gb2hjaS1wbGF0Zm9ybSBlZTBj MDAwMC51c2I6IGlycSAxNjYsIGlvIG1lbSAweGVlMGMwMDAwDQpbICAgIDIuMTI0NDU3XSBodWIg NC0wOjEuMDogVVNCIGh1YiBmb3VuZA0KWyAgICAyLjEyODIyMF0gaHViIDQtMDoxLjA6IDEgcG9y dCBkZXRlY3RlZA0KWyAgICAyLjEzMjM2MV0gb2hjaS1leHlub3M6IE9IQ0kgRVhZTk9TIGRyaXZl cg0KWyAgICAyLjEzNzA2OV0geGhjaS1oY2QgZWUwMDAwMDAudXNiOiB4SENJIEhvc3QgQ29udHJv bGxlcg0KWyAgICAyLjE0MjMwNV0geGhjaS1oY2QgZWUwMDAwMDAudXNiOiBuZXcgVVNCIGJ1cyBy ZWdpc3RlcmVkLCBhc3NpZ25lZCBidXMgbnVtYmVyIDUNClsgICAgMi4xNDk3NDhdIHhoY2ktaGNk IGVlMDAwMDAwLnVzYjogRGlyZWN0IGZpcm13YXJlIGxvYWQgZm9yIHI4YTc3OXhfdXNiM192My5k bG1lbSBmYWlsZWQgd2l0aCBlcnJvciAtMg0KWyAgICAyLjE1OTA2M10geGhjaS1oY2QgZWUwMDAw MDAudXNiOiBjYW4ndCBzZXR1cDogLTINClsgICAgMi4xNjM4NjFdIHhoY2ktaGNkIGVlMDAwMDAw LnVzYjogVVNCIGJ1cyA1IGRlcmVnaXN0ZXJlZA0KWyAgICAyLjE2OTI2Nl0gVW5hYmxlIHRvIGhh bmRsZSBrZXJuZWwgTlVMTCBwb2ludGVyIGRlcmVmZXJlbmNlIGF0IHZpcnR1YWwgYWRkcmVzcyAw MDAwMDAwMDAwMDAwMDIwDQpbICAgIDIuMTc4MDQyXSBNZW0gYWJvcnQgaW5mbzoNClsgICAgMi4x ODA4MjhdICAgRVNSID0gMHg5NjAwMDAwNA0KWyAgICAyLjE4Mzg3Nl0gICBFQyA9IDB4MjU6IERB QlQgKGN1cnJlbnQgRUwpLCBJTCA9IDMyIGJpdHMNClsgICAgMi4xODkxNzldICAgU0VUID0gMCwg Rm5WID0gMA0KWyAgICAyLjE5MjIyNl0gICBFQSA9IDAsIFMxUFRXID0gMA0KWyAgICAyLjE5NTM1 OF0gRGF0YSBhYm9ydCBpbmZvOg0KWyAgICAyLjE5ODIzMV0gICBJU1YgPSAwLCBJU1MgPSAweDAw MDAwMDA0DQpbICAgIDIuMjAyMDU4XSAgIENNID0gMCwgV25SID0gMA0KWyAgICAyLjIwNTAxOV0g WzAwMDAwMDAwMDAwMDAwMjBdIHVzZXIgYWRkcmVzcyBidXQgYWN0aXZlX21tIGlzIHN3YXBwZXIN ClsgICAgMi4yMTEzNjZdIEludGVybmFsIGVycm9yOiBPb3BzOiA5NjAwMDAwNCBbIzFdIFBSRUVN UFQgU01QDQpbICAgIDIuMjE2OTMwXSBNb2R1bGVzIGxpbmtlZCBpbjoNClsgICAgMi4yMTk5ODFd IENQVTogMiBQSUQ6IDEgQ29tbTogc3dhcHBlci8wIE5vdCB0YWludGVkIDUuMy4wLXJjNi1uZXh0 LTIwMTkwOTAyICM0Nw0KWyAgICAyLjIyNzQ1Nl0gSGFyZHdhcmUgbmFtZTogUmVuZXNhcyBTYWx2 YXRvci1YIGJvYXJkIGJhc2VkIG9uIHI4YTc3OTUgRVMyLjArIChEVCkNClsgICAgMi4yMzQ4NDRd IHBzdGF0ZTogYTAwMDAwODUgKE56Q3YgZGFJZiAtUEFOIC1VQU8pDQpbICAgIDIuMjM5NjM4XSBw YyA6IF9yYXdfd3JpdGVfbG9jaysweDY4LzB4Mjg4DQpbICAgIDIuMjQzODE5XSBsciA6IGRlc3Ry b3lfYXN5bmMrMHgyMC8weGIwDQpbICAgIDIuMjQ3NzMzXSBzcCA6IGZmZmY4MDAwMTAwNmI5YjAN ClsgICAgMi4yNTEwNDBdIHgyOTogZmZmZjgwMDAxMDA2YjliMCB4Mjg6IGZmZmY4MDAwMTExODZm ZDggDQpbICAgIDIuMjU2MzQ1XSB4Mjc6IGZmZmY4MDAwMTExODZmYzAgeDI2OiAwMDAwMDAwMGZm ZmZmZmVkIA0KWyAgICAyLjI2MTY1MF0geDI1OiBmZmZmODAwMDExOGZmNTcwIHgyNDogZmZmZjgw MDAxMThmZjAwMCANClsgICAgMi4yNjY5NTVdIHgyMzogMDAwMDAwMDAwMDAwMDAwNCB4MjI6IGZm ZmY4MDAwMTE5MDAwMDAgDQpbICAgIDIuMjcyMjU5XSB4MjE6IDAwMDAwMDAwMDAwMDAwMjAgeDIw OiAwMDAwMDAwMDAwMDAwMDI4IA0KWyAgICAyLjI3NzU2NF0geDE5OiAwMDAwMDAwMDAwMDAwMDAw IHgxODogMDAwMDAwMDAwMDAwMDAwNSANClsgICAgMi4yODI4NjhdIHgxNzogMDAwMDAwMDAwMDAw MDAyMCB4MTY6IGZmZmY4MDAwMTBkMTY0YjAgDQpbICAgIDIuMjg4MTcyXSB4MTU6IGZmZmY4MDAw MTE4ZmY2ZTggeDE0OiBmZmZmMDAwNzM1ODY3OTU4IA0KWyAgICAyLjI5MzQ3Nl0geDEzOiAwMDAw MDAwMDAwMDAwMDAwIHgxMjogZmZmZjgwMDAxMThmZjZlOCANClsgICAgMi4yOTg3NzldIHgxMTog ZmZmZjAwMDczNTg2NzkwOCB4MTA6IDAwMDAwMDAwMDAwMDAwNDAgDQpbICAgIDIuMzA0MDgzXSB4 OSA6IGZmZmY4MDAwMTE4ZmY2ZjAgeDggOiBmZmZmODAwMDExOGZmNmU4IA0KWyAgICAyLjMwOTM4 OF0geDcgOiBmZmZmMDAwNzM1ODY3OTU4IHg2IDogMDAwMDAwMDAwMDAwMDAwMCANClsgICAgMi4z MTQ2OTFdIHg1IDogMDAwMDAwMDAwMDAwMDAwMSB4NCA6IDAwMDAwMDAwMDAwMDAwMDAgDQpbICAg IDIuMzE5OTk1XSB4MyA6IDAwMDAwMDAwMDAwMDAwMjAgeDIgOiAwMDAwMDAwMDAwMDAwMDAxIA0K WyAgICAyLjMyNTI5OV0geDEgOiAwMDAwMDAwMDAwMDAwMDAwIHgwIDogMDAwMDAwMDAwMDAwMDAw MSANClsgICAgMi4zMzA2MDRdIENhbGwgdHJhY2U6DQpbICAgIDIuMzMzMDQ1XSAgX3Jhd193cml0 ZV9sb2NrKzB4NjgvMHgyODgNClsgICAgMi4zMzY4NzRdICBkZXN0cm95X2FzeW5jKzB4MjAvMHhi MA0KWyAgICAyLjM0MDQ0M10gIHVzYmRldl9yZW1vdmUrMHgzYy8weGMwDQpbICAgIDIuMzQ0MDEx XSAgdXNiZGV2X25vdGlmeSsweDIwLzB4MzgNClsgICAgMi4zNDc1ODNdICBub3RpZmllcl9jYWxs X2NoYWluKzB4NTQvMHg5OA0KWyAgICAyLjM1MTY3Ml0gIGJsb2NraW5nX25vdGlmaWVyX2NhbGxf Y2hhaW4rMHg0OC8weDcwDQpbICAgIDIuMzU2NTQzXSAgdXNiX25vdGlmeV9yZW1vdmVfYnVzKzB4 MWMvMHgyOA0KWyAgICAyLjM2MDgwOF0gIHVzYl9kZXJlZ2lzdGVyX2J1cysweDU4LzB4NjgNClsg ICAgMi4zNjQ4MTFdICB1c2JfYWRkX2hjZCsweDIzNC8weDczMA0KWyAgICAyLjM2ODM4MV0gIHho Y2lfcGxhdF9wcm9iZSsweDRlYy8weDY1MA0KWyAgICAyLjM3MjMwMl0gIHBsYXRmb3JtX2Rydl9w cm9iZSsweDUwLzB4YTANClsgICAgMi4zNzYzMDVdICByZWFsbHlfcHJvYmUrMHhkYy8weDM1MA0K WyAgICAyLjM3OTg3NF0gIGRyaXZlcl9wcm9iZV9kZXZpY2UrMHg1OC8weDEwMA0KWyAgICAyLjM4 NDA1MF0gIGRldmljZV9kcml2ZXJfYXR0YWNoKzB4NmMvMHg5MA0KWyAgICAyLjM4ODIyNl0gIF9f ZHJpdmVyX2F0dGFjaCsweDg0LzB4YzgNClsgICAgMi4zOTE5NjhdICBidXNfZm9yX2VhY2hfZGV2 KzB4NzQvMHhjOA0KWyAgICAyLjM5NTc5Nl0gIGRyaXZlcl9hdHRhY2grMHgyMC8weDI4DQpbICAg IDIuMzk5MzY1XSAgYnVzX2FkZF9kcml2ZXIrMHgxNDgvMHgxZjANClsgICAgMi40MDMxOTNdICBk cml2ZXJfcmVnaXN0ZXIrMHg2MC8weDExMA0KWyAgICAyLjQwNzAyMl0gIF9fcGxhdGZvcm1fZHJp dmVyX3JlZ2lzdGVyKzB4NDAvMHg0OA0KWyAgICAyLjQxMTcyM10gIHhoY2lfcGxhdF9pbml0KzB4 MmMvMHgzNA0KWyAgICAyLjQxNTM4MF0gIGRvX29uZV9pbml0Y2FsbCsweDVjLzB4MWIwDQpbICAg IDIuNDE5MjEzXSAga2VybmVsX2luaXRfZnJlZWFibGUrMHgxYTQvMHgyNGMNClsgICAgMi40MjM1 NjRdICBrZXJuZWxfaW5pdCsweDEwLzB4MTA4DQpbICAgIDIuNDI3MDQ1XSAgcmV0X2Zyb21fZm9y aysweDEwLzB4MTgNClsgICAgMi40MzA2MTddIENvZGU6IDk3ZDNmMmI2IGE4YzE3YmZkIGQ2NWYw M2MwIGY5ODAwMDcxICg4ODVmZmM2MCkgDQpbICAgIDIuNDM2NzE3XSAtLS1bIGVuZCB0cmFjZSAz M2U0ZmIzNDllYjQ4MDQ3IF0tLS0NClsgICAgMi40NDEzNDVdIG5vdGU6IHN3YXBwZXIvMFsxXSBl eGl0ZWQgd2l0aCBwcmVlbXB0X2NvdW50IDENClsgICAgMi40NDY4NDZdIEtlcm5lbCBwYW5pYyAt IG5vdCBzeW5jaW5nOiBBdHRlbXB0ZWQgdG8ga2lsbCBpbml0ISBleGl0Y29kZT0weDAwMDAwMDBi DQpbICAgIDIuNDU0NDk3XSBTTVA6IHN0b3BwaW5nIHNlY29uZGFyeSBDUFVzDQpbICAgIDIuNDU4 NDE2XSBLZXJuZWwgT2Zmc2V0OiBkaXNhYmxlZA0KWyAgICAyLjQ2MTg5OF0gQ1BVIGZlYXR1cmVz OiAweDAwMDIsMjEwMDYwMDQNClsgICAgMi40NjU4OTldIE1lbW9yeSBMaW1pdDogbm9uZQ0KWyAg ICAyLjQ2ODk1MF0gLS0tWyBlbmQgS2VybmVsIHBhbmljIC0gbm90IHN5bmNpbmc6IEF0dGVtcHRl ZCB0byBraWxsIGluaXQhIGV4aXRjb2RlPTB4MDAwMDAwMGIgXS0tLQ0KDQpbMl0gSSdtIHVzaW5n IGRlZmNvbmZpZyBvbiBhcmNoL2FybTY0IGFuZCBkaXNhYmxlIENPTkZJR19GV19MT0FERVJfVVNF Ul9IRUxQRVIuDQoNClszXSBUaGUgZm9sbG93aW5nIHBhbmljIGhhcHBlbmVkIHdoZW4gSSByZXZl cnRlZCB0aGUgY29tbWl0IGVmOWNjMjU1Yzk1MzkyODhmMTE5MTU2NDEyZDIzYTRiNzg1ZjM1OTkN CiAgICBvbiBuZXh0LTIwMTkwOTAyLg0KWyAgICAwLjAwMDAwMF0gQm9vdGluZyBMaW51eCBvbiBw aHlzaWNhbCBDUFUgMHgwMDAwMDAwMDAwIFsweDQxMWZkMDczXQ0KWyAgICAwLjAwMDAwMF0gTGlu dXggdmVyc2lvbiA1LjMuMC1yYzYtbmV4dC0yMDE5MDkwMi0wMDAwMS1nOTcwOTQ2OCAoc2hpbW9k YUBzaGltb2RhLVJCMDIxOTgpIChnY2MgdmVyc2lvbiA3LjQuMSAyMDE4MTIxMyBbbGluYXJvLTcu NC0yMDE5LjAyIHJldmlzaW9uIDU2ZWM2ZjZiOTljYzE2N2ZmMGMyZjhlMWEyZWVkMzNiMWVkYzg1 ZDRdIChMaW5hcm8gR0NDIDcuNC0yMDE5LjAyKSkgIzQ4IFNNUCBQUkVFTVBUIFR1ZSBTZXAgMyAx Nzo0Njo1NCBKU1QgMjAxOQ0KWyAgICAwLjAwMDAwMF0gTWFjaGluZSBtb2RlbDogUmVuZXNhcyBT YWx2YXRvci1YIGJvYXJkIGJhc2VkIG9uIHI4YTc3OTUgRVMyLjArDQpbICAgIDAuMDAwMDAwXSBw cmludGs6IGRlYnVnOiBpZ25vcmluZyBsb2dsZXZlbCBzZXR0aW5nLg0KWyAgICAwLjAwMDAwMF0g ZWZpOiBHZXR0aW5nIEVGSSBwYXJhbWV0ZXJzIGZyb20gRkRUOg0KWyAgICAwLjAwMDAwMF0gZWZp OiBVRUZJIG5vdCBmb3VuZC4NClsgICAgMC4wMDAwMDBdIGNtYTogUmVzZXJ2ZWQgMzIgTWlCIGF0 IDB4MDAwMDAwMDBiZTAwMDAwMA0KWyAgICAwLjAwMDAwMF0gTlVNQTogTm8gTlVNQSBjb25maWd1 cmF0aW9uIGZvdW5kDQpbICAgIDAuMDAwMDAwXSBOVU1BOiBGYWtpbmcgYSBub2RlIGF0IFttZW0g MHgwMDAwMDAwMDQ4MDAwMDAwLTB4MDAwMDAwMDc3ZmZmZmZmZl0NClsgICAgMC4wMDAwMDBdIE5V TUE6IE5PREVfREFUQSBbbWVtIDB4NzdlZmRiODAwLTB4NzdlZmRjZmZmXQ0KWyAgICAwLjAwMDAw MF0gWm9uZSByYW5nZXM6DQpbICAgIDAuMDAwMDAwXSAgIERNQTMyICAgIFttZW0gMHgwMDAwMDAw MDQ4MDAwMDAwLTB4MDAwMDAwMDBmZmZmZmZmZl0NClsgICAgMC4wMDAwMDBdICAgTm9ybWFsICAg W21lbSAweDAwMDAwMDAxMDAwMDAwMDAtMHgwMDAwMDAwNzdmZmZmZmZmXQ0KWyAgICAwLjAwMDAw MF0gTW92YWJsZSB6b25lIHN0YXJ0IGZvciBlYWNoIG5vZGUNClsgICAgMC4wMDAwMDBdIEVhcmx5 IG1lbW9yeSBub2RlIHJhbmdlcw0KWyAgICAwLjAwMDAwMF0gICBub2RlICAgMDogW21lbSAweDAw MDAwMDAwNDgwMDAwMDAtMHgwMDAwMDAwMGJmZmZmZmZmXQ0KWyAgICAwLjAwMDAwMF0gICBub2Rl ICAgMDogW21lbSAweDAwMDAwMDA1MDAwMDAwMDAtMHgwMDAwMDAwNTdmZmZmZmZmXQ0KWyAgICAw LjAwMDAwMF0gICBub2RlICAgMDogW21lbSAweDAwMDAwMDA2MDAwMDAwMDAtMHgwMDAwMDAwNjdm ZmZmZmZmXQ0KWyAgICAwLjAwMDAwMF0gICBub2RlICAgMDogW21lbSAweDAwMDAwMDA3MDAwMDAw MDAtMHgwMDAwMDAwNzdmZmZmZmZmXQ0KWyAgICAwLjAwMDAwMF0gSW5pdG1lbSBzZXR1cCBub2Rl IDAgW21lbSAweDAwMDAwMDAwNDgwMDAwMDAtMHgwMDAwMDAwNzdmZmZmZmZmXQ0KWyAgICAwLjAw MDAwMF0gT24gbm9kZSAwIHRvdGFscGFnZXM6IDIwNjQzODQNClsgICAgMC4wMDAwMDBdICAgRE1B MzIgem9uZTogNzY4MCBwYWdlcyB1c2VkIGZvciBtZW1tYXANClsgICAgMC4wMDAwMDBdICAgRE1B MzIgem9uZTogMCBwYWdlcyByZXNlcnZlZA0KWyAgICAwLjAwMDAwMF0gICBETUEzMiB6b25lOiA0 OTE1MjAgcGFnZXMsIExJRk8gYmF0Y2g6NjMNClsgICAgMC4wMDAwMDBdICAgTm9ybWFsIHpvbmU6 IDI0NTc2IHBhZ2VzIHVzZWQgZm9yIG1lbW1hcA0KWyAgICAwLjAwMDAwMF0gICBOb3JtYWwgem9u ZTogMTU3Mjg2NCBwYWdlcywgTElGTyBiYXRjaDo2Mw0KWyAgICAwLjAwMDAwMF0gcHNjaTogcHJv YmluZyBmb3IgY29uZHVpdCBtZXRob2QgZnJvbSBEVC4NClsgICAgMC4wMDAwMDBdIHBzY2k6IFBT Q0l2MS4xIGRldGVjdGVkIGluIGZpcm13YXJlLg0KWyAgICAwLjAwMDAwMF0gcHNjaTogVXNpbmcg c3RhbmRhcmQgUFNDSSB2MC4yIGZ1bmN0aW9uIElEcw0KWyAgICAwLjAwMDAwMF0gcHNjaTogVHJ1 c3RlZCBPUyBtaWdyYXRpb24gbm90IHJlcXVpcmVkDQpbICAgIDAuMDAwMDAwXSBwc2NpOiBTTUMg Q2FsbGluZyBDb252ZW50aW9uIHYxLjENClsgICAgMC4wMDAwMDBdIHBlcmNwdTogRW1iZWRkZWQg MjIgcGFnZXMvY3B1IHM1Mjk1MiByODE5MiBkMjg5NjggdTkwMTEyDQpbICAgIDAuMDAwMDAwXSBw Y3B1LWFsbG9jOiBzNTI5NTIgcjgxOTIgZDI4OTY4IHU5MDExMiBhbGxvYz0yMio0MDk2DQpbICAg IDAuMDAwMDAwXSBwY3B1LWFsbG9jOiBbMF0gMCBbMF0gMSBbMF0gMiBbMF0gMyBbMF0gNCBbMF0g NSBbMF0gNiBbMF0gNyANClsgICAgMC4wMDAwMDBdIERldGVjdGVkIFBJUFQgSS1jYWNoZSBvbiBD UFUwDQpbICAgIDAuMDAwMDAwXSBDUFUgZmVhdHVyZXM6IGRldGVjdGVkOiBFTDIgdmVjdG9yIGhh cmRlbmluZw0KWyAgICAwLjAwMDAwMF0gU3BlY3VsYXRpdmUgU3RvcmUgQnlwYXNzIERpc2FibGUg bWl0aWdhdGlvbiBub3QgcmVxdWlyZWQNClsgICAgMC4wMDAwMDBdIEJ1aWx0IDEgem9uZWxpc3Rz LCBtb2JpbGl0eSBncm91cGluZyBvbi4gIFRvdGFsIHBhZ2VzOiAyMDMyMTI4DQpbICAgIDAuMDAw MDAwXSBQb2xpY3kgem9uZTogTm9ybWFsDQpbICAgIDAuMDAwMDAwXSBLZXJuZWwgY29tbWFuZCBs aW5lOiBjb25zb2xlPXR0eVNDMCwxMTUyMDAgaWdub3JlX2xvZ2xldmVsIGNvbnNvbGVibGFuaz0w IHJ3IHJvb3Q9L2Rldi9uZnMgaXA9ZGhjcA0KWyAgICAwLjAwMDAwMF0gRGVudHJ5IGNhY2hlIGhh c2ggdGFibGUgZW50cmllczogMTA0ODU3NiAob3JkZXI6IDExLCA4Mzg4NjA4IGJ5dGVzLCBsaW5l YXIpDQpbICAgIDAuMDAwMDAwXSBJbm9kZS1jYWNoZSBoYXNoIHRhYmxlIGVudHJpZXM6IDUyNDI4 OCAob3JkZXI6IDEwLCA0MTk0MzA0IGJ5dGVzLCBsaW5lYXIpDQpbICAgIDAuMDAwMDAwXSBtZW0g YXV0by1pbml0OiBzdGFjazpvZmYsIGhlYXAgYWxsb2M6b2ZmLCBoZWFwIGZyZWU6b2ZmDQpbICAg IDAuMDAwMDAwXSBzb2Z0d2FyZSBJTyBUTEI6IG1hcHBlZCBbbWVtIDB4YmEwMDAwMDAtMHhiZTAw MDAwMF0gKDY0TUIpDQpbICAgIDAuMDAwMDAwXSBNZW1vcnk6IDc5NzIzNjhLLzgyNTc1MzZLIGF2 YWlsYWJsZSAoMTIwOTJLIGtlcm5lbCBjb2RlLCAxODQ2SyByd2RhdGEsIDYzMjBLIHJvZGF0YSwg NDk5MksgaW5pdCwgNDUwSyBic3MsIDI1MjQwMEsgcmVzZXJ2ZWQsIDMyNzY4SyBjbWEtcmVzZXJ2 ZWQpDQpbICAgIDAuMDAwMDAwXSBTTFVCOiBIV2FsaWduPTY0LCBPcmRlcj0wLTMsIE1pbk9iamVj dHM9MCwgQ1BVcz04LCBOb2Rlcz0xDQpbICAgIDAuMDAwMDAwXSByY3U6IFByZWVtcHRpYmxlIGhp ZXJhcmNoaWNhbCBSQ1UgaW1wbGVtZW50YXRpb24uDQpbICAgIDAuMDAwMDAwXSByY3U6IAlSQ1Ug cmVzdHJpY3RpbmcgQ1BVcyBmcm9tIE5SX0NQVVM9MjU2IHRvIG5yX2NwdV9pZHM9OC4NClsgICAg MC4wMDAwMDBdIAlUYXNrcyBSQ1UgZW5hYmxlZC4NClsgICAgMC4wMDAwMDBdIHJjdTogUkNVIGNh bGN1bGF0ZWQgdmFsdWUgb2Ygc2NoZWR1bGVyLWVubGlzdG1lbnQgZGVsYXkgaXMgMjUgamlmZmll cy4NClsgICAgMC4wMDAwMDBdIHJjdTogQWRqdXN0aW5nIGdlb21ldHJ5IGZvciByY3VfZmFub3V0 X2xlYWY9MTYsIG5yX2NwdV9pZHM9OA0KWyAgICAwLjAwMDAwMF0gTlJfSVJRUzogNjQsIG5yX2ly cXM6IDY0LCBwcmVhbGxvY2F0ZWQgaXJxczogMA0KWyAgICAwLjAwMDAwMF0gR0lDOiBBZGp1c3Rp bmcgQ1BVIGludGVyZmFjZSBiYXNlIHRvIDB4MDAwMDAwMDBmMTAyZjAwMA0KWyAgICAwLjAwMDAw MF0gR0lDOiBVc2luZyBzcGxpdCBFT0kvRGVhY3RpdmF0ZSBtb2RlDQpbICAgIDAuMDAwMDAwXSBy YW5kb206IGdldF9yYW5kb21fYnl0ZXMgY2FsbGVkIGZyb20gc3RhcnRfa2VybmVsKzB4MmYwLzB4 NDkwIHdpdGggY3JuZ19pbml0PTANClsgICAgMC4wMDAwMDBdIGFyY2hfdGltZXI6IGNwMTUgdGlt ZXIocykgcnVubmluZyBhdCA4LjMzTUh6IChwaHlzKS4NClsgICAgMC4wMDAwMDBdIGNsb2Nrc291 cmNlOiBhcmNoX3N5c19jb3VudGVyOiBtYXNrOiAweGZmZmZmZmZmZmZmZmZmIG1heF9jeWNsZXM6 IDB4MWVjMDI5MjNlLCBtYXhfaWRsZV9uczogNDQwNzk1MjAyMTI1IG5zDQpbICAgIDAuMDAwMDAy XSBzY2hlZF9jbG9jazogNTYgYml0cyBhdCA4TUh6LCByZXNvbHV0aW9uIDEyMG5zLCB3cmFwcyBl dmVyeSAyMTk5MDIzMjU1NDk2bnMNClsgICAgMC4wMDAxNDFdIENvbnNvbGU6IGNvbG91ciBkdW1t eSBkZXZpY2UgODB4MjUNClsgICAgMC4wMDAyMDddIENhbGlicmF0aW5nIGRlbGF5IGxvb3AgKHNr aXBwZWQpLCB2YWx1ZSBjYWxjdWxhdGVkIHVzaW5nIHRpbWVyIGZyZXF1ZW5jeS4uIDE2LjY2IEJv Z29NSVBTIChscGo9MzMzMzMpDQpbICAgIDAuMDAwMjE1XSBwaWRfbWF4OiBkZWZhdWx0OiAzMjc2 OCBtaW5pbXVtOiAzMDENClsgICAgMC4wMDAyNzBdIExTTTogU2VjdXJpdHkgRnJhbWV3b3JrIGlu aXRpYWxpemluZw0KWyAgICAwLjAwMDM1Ml0gTW91bnQtY2FjaGUgaGFzaCB0YWJsZSBlbnRyaWVz OiAxNjM4NCAob3JkZXI6IDUsIDEzMTA3MiBieXRlcywgbGluZWFyKQ0KWyAgICAwLjAwMDM5OF0g TW91bnRwb2ludC1jYWNoZSBoYXNoIHRhYmxlIGVudHJpZXM6IDE2Mzg0IChvcmRlcjogNSwgMTMx MDcyIGJ5dGVzLCBsaW5lYXIpDQpbICAgIDAuMDIzOTc4XSBBU0lEIGFsbG9jYXRvciBpbml0aWFs aXNlZCB3aXRoIDMyNzY4IGVudHJpZXMNClsgICAgMC4wMzE5NjhdIHJjdTogSGllcmFyY2hpY2Fs IFNSQ1UgaW1wbGVtZW50YXRpb24uDQpbICAgIDAuMDQxMDQwXSBEZXRlY3RlZCBSZW5lc2FzIFIt Q2FyIEdlbjMgcjhhNzc5NSBFUzMuMA0KWyAgICAwLjA0MjM2NF0gRUZJIHNlcnZpY2VzIHdpbGwg bm90IGJlIGF2YWlsYWJsZS4NClsgICAgMC4wNDc5OTZdIHNtcDogQnJpbmdpbmcgdXAgc2Vjb25k YXJ5IENQVXMgLi4uDQpbICAgIDAuMDgwMTgzXSBEZXRlY3RlZCBQSVBUIEktY2FjaGUgb24gQ1BV MQ0KWyAgICAwLjA4MDIyMl0gQ1BVMTogQm9vdGVkIHNlY29uZGFyeSBwcm9jZXNzb3IgMHgwMDAw MDAwMDAxIFsweDQxMWZkMDczXQ0KWyAgICAwLjExMjE5NV0gRGV0ZWN0ZWQgUElQVCBJLWNhY2hl IG9uIENQVTINClsgICAgMC4xMTIyMTZdIENQVTI6IEJvb3RlZCBzZWNvbmRhcnkgcHJvY2Vzc29y IDB4MDAwMDAwMDAwMiBbMHg0MTFmZDA3M10NClsgICAgMC4xNDQyMzJdIERldGVjdGVkIFBJUFQg SS1jYWNoZSBvbiBDUFUzDQpbICAgIDAuMTQ0MjUzXSBDUFUzOiBCb290ZWQgc2Vjb25kYXJ5IHBy b2Nlc3NvciAweDAwMDAwMDAwMDMgWzB4NDExZmQwNzNdDQpbICAgIDAuMTc2Mjc2XSBDUFUgZmVh dHVyZXM6IGRldGVjdGVkOiBBUk0gZXJyYXR1bSA4NDU3MTkNClsgICAgMC4xNzYyODZdIERldGVj dGVkIFZJUFQgSS1jYWNoZSBvbiBDUFU0DQpbICAgIDAuMTc2MzI0XSBDUFU0OiBCb290ZWQgc2Vj b25kYXJ5IHByb2Nlc3NvciAweDAwMDAwMDAxMDAgWzB4NDEwZmQwMzRdDQpbICAgIDAuMjA4Mjk3 XSBEZXRlY3RlZCBWSVBUIEktY2FjaGUgb24gQ1BVNQ0KWyAgICAwLjIwODMyMV0gQ1BVNTogQm9v dGVkIHNlY29uZGFyeSBwcm9jZXNzb3IgMHgwMDAwMDAwMTAxIFsweDQxMGZkMDM0XQ0KWyAgICAw LjI0MDMzOF0gRGV0ZWN0ZWQgVklQVCBJLWNhY2hlIG9uIENQVTYNClsgICAgMC4yNDAzNjFdIENQ VTY6IEJvb3RlZCBzZWNvbmRhcnkgcHJvY2Vzc29yIDB4MDAwMDAwMDEwMiBbMHg0MTBmZDAzNF0N ClsgICAgMC4yNzIzNzVdIERldGVjdGVkIFZJUFQgSS1jYWNoZSBvbiBDUFU3DQpbICAgIDAuMjcy Mzk4XSBDUFU3OiBCb290ZWQgc2Vjb25kYXJ5IHByb2Nlc3NvciAweDAwMDAwMDAxMDMgWzB4NDEw ZmQwMzRdDQpbICAgIDAuMjcyNDczXSBzbXA6IEJyb3VnaHQgdXAgMSBub2RlLCA4IENQVXMNClsg ICAgMC4yNzI0OTJdIFNNUDogVG90YWwgb2YgOCBwcm9jZXNzb3JzIGFjdGl2YXRlZC4NClsgICAg MC4yNzI0OTddIENQVSBmZWF0dXJlczogZGV0ZWN0ZWQ6IDMyLWJpdCBFTDAgU3VwcG9ydA0KWyAg ICAwLjI3MjUwMl0gQ1BVIGZlYXR1cmVzOiBkZXRlY3RlZDogQ1JDMzIgaW5zdHJ1Y3Rpb25zDQpb ICAgIDAuMjgyNzQ5XSBDUFU6IEFsbCBDUFUocykgc3RhcnRlZCBhdCBFTDINClsgICAgMC4yODI3 NzddIGFsdGVybmF0aXZlczogcGF0Y2hpbmcga2VybmVsIGNvZGUNClsgICAgMC4yODM4MTVdIGRl dnRtcGZzOiBpbml0aWFsaXplZA0KWyAgICAwLjI4OTY0NF0gY2xvY2tzb3VyY2U6IGppZmZpZXM6 IG1hc2s6IDB4ZmZmZmZmZmYgbWF4X2N5Y2xlczogMHhmZmZmZmZmZiwgbWF4X2lkbGVfbnM6IDc2 NDUwNDE3ODUxMDAwMDAgbnMNClsgICAgMC4yODk2NTldIGZ1dGV4IGhhc2ggdGFibGUgZW50cmll czogMjA0OCAob3JkZXI6IDUsIDEzMTA3MiBieXRlcywgbGluZWFyKQ0KWyAgICAwLjI5MDM1M10g cGluY3RybCBjb3JlOiBpbml0aWFsaXplZCBwaW5jdHJsIHN1YnN5c3RlbQ0KWyAgICAwLjI5MTUz OF0gRE1JIG5vdCBwcmVzZW50IG9yIGludmFsaWQuDQpbICAgIDAuMjkxNzc3XSBORVQ6IFJlZ2lz dGVyZWQgcHJvdG9jb2wgZmFtaWx5IDE2DQpbICAgIDAuMjkyNTYxXSBETUE6IHByZWFsbG9jYXRl ZCAyNTYgS2lCIHBvb2wgZm9yIGF0b21pYyBhbGxvY2F0aW9ucw0KWyAgICAwLjI5MjU3MV0gYXVk aXQ6IGluaXRpYWxpemluZyBuZXRsaW5rIHN1YnN5cyAoZGlzYWJsZWQpDQpbICAgIDAuMjkyNzA5 XSBhdWRpdDogdHlwZT0yMDAwIGF1ZGl0KDAuMjkyOjEpOiBzdGF0ZT1pbml0aWFsaXplZCBhdWRp dF9lbmFibGVkPTAgcmVzPTENClsgICAgMC4yOTM3NDNdIGNwdWlkbGU6IHVzaW5nIGdvdmVybm9y IG1lbnUNClsgICAgMC4yOTM4OThdIGh3LWJyZWFrcG9pbnQ6IGZvdW5kIDYgYnJlYWtwb2ludCBh bmQgNCB3YXRjaHBvaW50IHJlZ2lzdGVycy4NClsgICAgMC4yOTQ4NjJdIFNlcmlhbDogQU1CQSBQ TDAxMSBVQVJUIGRyaXZlcg0KWyAgICAwLjI5NzA2MV0gc2gtcGZjIGU2MDYwMDAwLnBpbi1jb250 cm9sbGVyOiBJUlEgaW5kZXggMCBub3QgZm91bmQNClsgICAgMC4yOTcyODBdIHNoLXBmYyBlNjA2 MDAwMC5waW4tY29udHJvbGxlcjogcjhhNzc5NTFfcGZjIHN1cHBvcnQgcmVnaXN0ZXJlZA0KWyAg ICAwLjMxNzQ5MF0gSHVnZVRMQiByZWdpc3RlcmVkIDEuMDAgR2lCIHBhZ2Ugc2l6ZSwgcHJlLWFs bG9jYXRlZCAwIHBhZ2VzDQpbICAgIDAuMzE3NDk4XSBIdWdlVExCIHJlZ2lzdGVyZWQgMzIuMCBN aUIgcGFnZSBzaXplLCBwcmUtYWxsb2NhdGVkIDAgcGFnZXMNClsgICAgMC4zMTc1MDNdIEh1Z2VU TEIgcmVnaXN0ZXJlZCAyLjAwIE1pQiBwYWdlIHNpemUsIHByZS1hbGxvY2F0ZWQgMCBwYWdlcw0K WyAgICAwLjMxNzUwNl0gSHVnZVRMQiByZWdpc3RlcmVkIDY0LjAgS2lCIHBhZ2Ugc2l6ZSwgcHJl LWFsbG9jYXRlZCAwIHBhZ2VzDQpbICAgIDAuMzE5Mjc4XSBjcnlwdGQ6IG1heF9jcHVfcWxlbiBz ZXQgdG8gMTAwMA0KWyAgICAwLjMyMjE2Ml0gQUNQSTogSW50ZXJwcmV0ZXIgZGlzYWJsZWQuDQpb ICAgIDAuMzI1NzA3XSBpb21tdTogRGVmYXVsdCBkb21haW4gdHlwZTogVHJhbnNsYXRlZCANClsg ICAgMC4zMjU5MDldIHZnYWFyYjogbG9hZGVkDQpbICAgIDAuMzI2MTAwXSBTQ1NJIHN1YnN5c3Rl bSBpbml0aWFsaXplZA0KWyAgICAwLjMyNjE5OV0gbGliYXRhIHZlcnNpb24gMy4wMCBsb2FkZWQu DQpbICAgIDAuMzI2MzMzXSB1c2Jjb3JlOiByZWdpc3RlcmVkIG5ldyBpbnRlcmZhY2UgZHJpdmVy IHVzYmZzDQpbICAgIDAuMzI2MzUzXSB1c2Jjb3JlOiByZWdpc3RlcmVkIG5ldyBpbnRlcmZhY2Ug ZHJpdmVyIGh1Yg0KWyAgICAwLjMyNjM5NV0gdXNiY29yZTogcmVnaXN0ZXJlZCBuZXcgZGV2aWNl IGRyaXZlciB1c2INClsgICAgMC4zMjczMzZdIGkyYy1zaF9tb2JpbGUgZTYwYjAwMDAuaTJjOiBJ MkMgYWRhcHRlciA3LCBidXMgc3BlZWQgNDAwMDAwIEh6DQpbICAgIDAuMzI3NjMxXSBwcHNfY29y ZTogTGludXhQUFMgQVBJIHZlci4gMSByZWdpc3RlcmVkDQpbICAgIDAuMzI3NjM2XSBwcHNfY29y ZTogU29mdHdhcmUgdmVyLiA1LjMuNiAtIENvcHlyaWdodCAyMDA1LTIwMDcgUm9kb2xmbyBHaW9t ZXR0aSA8Z2lvbWV0dGlAbGludXguaXQ+DQpbICAgIDAuMzI3NjQ0XSBQVFAgY2xvY2sgc3VwcG9y dCByZWdpc3RlcmVkDQpbICAgIDAuMzI3NzcwXSBFREFDIE1DOiBWZXI6IDMuMC4wDQpbICAgIDAu MzI5MDQ3XSBGUEdBIG1hbmFnZXIgZnJhbWV3b3JrDQpbICAgIDAuMzI5MDg5XSBBZHZhbmNlZCBM aW51eCBTb3VuZCBBcmNoaXRlY3R1cmUgRHJpdmVyIEluaXRpYWxpemVkLg0KWyAgICAwLjMyOTU0 OF0gY2xvY2tzb3VyY2U6IFN3aXRjaGVkIHRvIGNsb2Nrc291cmNlIGFyY2hfc3lzX2NvdW50ZXIN ClsgICAgMC4zMjk2OTZdIFZGUzogRGlzayBxdW90YXMgZHF1b3RfNi42LjANClsgICAgMC4zMjk3 MzhdIFZGUzogRHF1b3QtY2FjaGUgaGFzaCB0YWJsZSBlbnRyaWVzOiA1MTIgKG9yZGVyIDAsIDQw OTYgYnl0ZXMpDQpbICAgIDAuMzI5ODU1XSBwbnA6IFBuUCBBQ1BJOiBkaXNhYmxlZA0KWyAgICAw LjMzMjg0M10gdGhlcm1hbF9zeXM6IFJlZ2lzdGVyZWQgdGhlcm1hbCBnb3Zlcm5vciAnc3RlcF93 aXNlJw0KWyAgICAwLjMzMjg0NV0gdGhlcm1hbF9zeXM6IFJlZ2lzdGVyZWQgdGhlcm1hbCBnb3Zl cm5vciAncG93ZXJfYWxsb2NhdG9yJw0KWyAgICAwLjMzMzMzN10gTkVUOiBSZWdpc3RlcmVkIHBy b3RvY29sIGZhbWlseSAyDQpbICAgIDAuMzMzNjE4XSB0Y3BfbGlzdGVuX3BvcnRhZGRyX2hhc2gg aGFzaCB0YWJsZSBlbnRyaWVzOiA0MDk2IChvcmRlcjogNCwgNjU1MzYgYnl0ZXMsIGxpbmVhcikN ClsgICAgMC4zMzM2ODJdIFRDUCBlc3RhYmxpc2hlZCBoYXNoIHRhYmxlIGVudHJpZXM6IDY1NTM2 IChvcmRlcjogNywgNTI0Mjg4IGJ5dGVzLCBsaW5lYXIpDQpbICAgIDAuMzMzOTYxXSBUQ1AgYmlu ZCBoYXNoIHRhYmxlIGVudHJpZXM6IDY1NTM2IChvcmRlcjogOCwgMTA0ODU3NiBieXRlcywgbGlu ZWFyKQ0KWyAgICAwLjMzNDU1MV0gVENQOiBIYXNoIHRhYmxlcyBjb25maWd1cmVkIChlc3RhYmxp c2hlZCA2NTUzNiBiaW5kIDY1NTM2KQ0KWyAgICAwLjMzNDY2MV0gVURQIGhhc2ggdGFibGUgZW50 cmllczogNDA5NiAob3JkZXI6IDUsIDEzMTA3MiBieXRlcywgbGluZWFyKQ0KWyAgICAwLjMzNDc2 OV0gVURQLUxpdGUgaGFzaCB0YWJsZSBlbnRyaWVzOiA0MDk2IChvcmRlcjogNSwgMTMxMDcyIGJ5 dGVzLCBsaW5lYXIpDQpbICAgIDAuMzM0OTc5XSBORVQ6IFJlZ2lzdGVyZWQgcHJvdG9jb2wgZmFt aWx5IDENClsgICAgMC4zMzUzMjldIFJQQzogUmVnaXN0ZXJlZCBuYW1lZCBVTklYIHNvY2tldCB0 cmFuc3BvcnQgbW9kdWxlLg0KWyAgICAwLjMzNTMzM10gUlBDOiBSZWdpc3RlcmVkIHVkcCB0cmFu c3BvcnQgbW9kdWxlLg0KWyAgICAwLjMzNTMzN10gUlBDOiBSZWdpc3RlcmVkIHRjcCB0cmFuc3Bv cnQgbW9kdWxlLg0KWyAgICAwLjMzNTM0MF0gUlBDOiBSZWdpc3RlcmVkIHRjcCBORlN2NC4xIGJh Y2tjaGFubmVsIHRyYW5zcG9ydCBtb2R1bGUuDQpbICAgIDAuMzM1MzQ5XSBQQ0k6IENMUyAwIGJ5 dGVzLCBkZWZhdWx0IDY0DQpbICAgIDAuMzM2MTc5XSBodyBwZXJmZXZlbnRzOiBlbmFibGVkIHdp dGggYXJtdjhfY29ydGV4X2E1MyBQTVUgZHJpdmVyLCA3IGNvdW50ZXJzIGF2YWlsYWJsZQ0KWyAg ICAwLjMzNjQxM10gaHcgcGVyZmV2ZW50czogZW5hYmxlZCB3aXRoIGFybXY4X2NvcnRleF9hNTcg UE1VIGRyaXZlciwgNyBjb3VudGVycyBhdmFpbGFibGUNClsgICAgMC4zMzY4MzddIGt2bSBbMV06 IElQQSBTaXplIExpbWl0OiA0MGJpdHMNClsgICAgMC4zMzczMTNdIGt2bSBbMV06IHZnaWMgaW50 ZXJydXB0IElSUTENClsgICAgMC4zMzc0NThdIGt2bSBbMV06IEh5cCBtb2RlIGluaXRpYWxpemVk IHN1Y2Nlc3NmdWxseQ0KWyAgICAwLjM0MTgzNF0gSW5pdGlhbGlzZSBzeXN0ZW0gdHJ1c3RlZCBr ZXlyaW5ncw0KWyAgICAwLjM0MTkyNF0gd29ya2luZ3NldDogdGltZXN0YW1wX2JpdHM9NDQgbWF4 X29yZGVyPTIxIGJ1Y2tldF9vcmRlcj0wDQpbICAgIDAuMzQ1MTQ4XSBzcXVhc2hmczogdmVyc2lv biA0LjAgKDIwMDkvMDEvMzEpIFBoaWxsaXAgTG91Z2hlcg0KWyAgICAwLjM0NTU4NV0gTkZTOiBS ZWdpc3RlcmluZyB0aGUgaWRfcmVzb2x2ZXIga2V5IHR5cGUNClsgICAgMC4zNDU2MDJdIEtleSB0 eXBlIGlkX3Jlc29sdmVyIHJlZ2lzdGVyZWQNClsgICAgMC4zNDU2MDZdIEtleSB0eXBlIGlkX2xl Z2FjeSByZWdpc3RlcmVkDQpbICAgIDAuMzQ1NjE1XSBuZnM0ZmlsZWxheW91dF9pbml0OiBORlN2 NCBGaWxlIExheW91dCBEcml2ZXIgUmVnaXN0ZXJpbmcuLi4NClsgICAgMC4zNDU3MTRdIDlwOiBJ bnN0YWxsaW5nIHY5ZnMgOXAyMDAwIGZpbGUgc3lzdGVtIHN1cHBvcnQNClsgICAgMC4zNTQ4MTRd IEtleSB0eXBlIGFzeW1tZXRyaWMgcmVnaXN0ZXJlZA0KWyAgICAwLjM1NDgxOV0gQXN5bW1ldHJp YyBrZXkgcGFyc2VyICd4NTA5JyByZWdpc3RlcmVkDQpbICAgIDAuMzU0ODQ0XSBCbG9jayBsYXll ciBTQ1NJIGdlbmVyaWMgKGJzZykgZHJpdmVyIHZlcnNpb24gMC40IGxvYWRlZCAobWFqb3IgMjQ1 KQ0KWyAgICAwLjM1NDg0OV0gaW8gc2NoZWR1bGVyIG1xLWRlYWRsaW5lIHJlZ2lzdGVyZWQNClsg ICAgMC4zNTQ4NTNdIGlvIHNjaGVkdWxlciBreWJlciByZWdpc3RlcmVkDQpbICAgIDAuMzU5NDc2 XSBwaHlfcmNhcl9nZW4zX3VzYjIgZWUwYTAyMDAudXNiLXBoeTogSVJRIGluZGV4IDAgbm90IGZv dW5kDQpbICAgIDAuMzYwMTc0XSBwaHlfcmNhcl9nZW4zX3VzYjIgZWUwYzAyMDAudXNiLXBoeTog SVJRIGluZGV4IDAgbm90IGZvdW5kDQpbICAgIDAuMzY1ODgxXSBncGlvX3JjYXIgZTYwNTAwMDAu Z3BpbzogZHJpdmluZyAxNiBHUElPcw0KWyAgICAwLjM2NjA1N10gZ3Bpb19yY2FyIGU2MDUxMDAw LmdwaW86IGRyaXZpbmcgMjkgR1BJT3MNClsgICAgMC4zNjYyMjJdIGdwaW9fcmNhciBlNjA1MjAw MC5ncGlvOiBkcml2aW5nIDE1IEdQSU9zDQpbICAgIDAuMzY2Mzc2XSBncGlvX3JjYXIgZTYwNTMw MDAuZ3BpbzogZHJpdmluZyAxNiBHUElPcw0KWyAgICAwLjM2NjUzNl0gZ3Bpb19yY2FyIGU2MDU0 MDAwLmdwaW86IGRyaXZpbmcgMTggR1BJT3MNClsgICAgMC4zNjY2ODhdIGdwaW9fcmNhciBlNjA1 NTAwMC5ncGlvOiBkcml2aW5nIDI2IEdQSU9zDQpbICAgIDAuMzY2ODYwXSBncGlvX3JjYXIgZTYw NTU0MDAuZ3BpbzogZHJpdmluZyAzMiBHUElPcw0KWyAgICAwLjM2NzAxNV0gZ3Bpb19yY2FyIGU2 MDU1ODAwLmdwaW86IGRyaXZpbmcgNCBHUElPcw0KWyAgICAwLjM2ODQ3M10gcmNhci1wY2llIGZl MDAwMDAwLnBjaWU6IGhvc3QgYnJpZGdlIC9zb2MvcGNpZUBmZTAwMDAwMCByYW5nZXM6DQpbICAg IDAuMzY4NDk4XSByY2FyLXBjaWUgZmUwMDAwMDAucGNpZTogICAgSU8gMHhmZTEwMDAwMC4uMHhm ZTFmZmZmZiAtPiAweDAwMDAwMDAwDQpbICAgIDAuMzY4NTE0XSByY2FyLXBjaWUgZmUwMDAwMDAu cGNpZTogICBNRU0gMHhmZTIwMDAwMC4uMHhmZTNmZmZmZiAtPiAweGZlMjAwMDAwDQpbICAgIDAu MzY4NTI3XSByY2FyLXBjaWUgZmUwMDAwMDAucGNpZTogICBNRU0gMHgzMDAwMDAwMC4uMHgzN2Zm ZmZmZiAtPiAweDMwMDAwMDAwDQpbICAgIDAuMzY4NTM2XSByY2FyLXBjaWUgZmUwMDAwMDAucGNp ZTogICBNRU0gMHgzODAwMDAwMC4uMHgzZmZmZmZmZiAtPiAweDM4MDAwMDAwDQpbICAgIDAuNDM3 MDM3XSByY2FyLXBjaWUgZmUwMDAwMDAucGNpZTogUENJZSBsaW5rIGRvd24NClsgICAgMC40Mzcx ODddIHJjYXItcGNpZSBlZTgwMDAwMC5wY2llOiBob3N0IGJyaWRnZSAvc29jL3BjaWVAZWU4MDAw MDAgcmFuZ2VzOg0KWyAgICAwLjQzNzIwNV0gcmNhci1wY2llIGVlODAwMDAwLnBjaWU6ICAgIElP IDB4ZWU5MDAwMDAuLjB4ZWU5ZmZmZmYgLT4gMHgwMDAwMDAwMA0KWyAgICAwLjQzNzIxOF0gcmNh ci1wY2llIGVlODAwMDAwLnBjaWU6ICAgTUVNIDB4ZWVhMDAwMDAuLjB4ZWViZmZmZmYgLT4gMHhl ZWEwMDAwMA0KWyAgICAwLjQzNzIzMF0gcmNhci1wY2llIGVlODAwMDAwLnBjaWU6ICAgTUVNIDB4 YzAwMDAwMDAuLjB4YzdmZmZmZmYgLT4gMHhjMDAwMDAwMA0KWyAgICAwLjQzNzIzOV0gcmNhci1w Y2llIGVlODAwMDAwLnBjaWU6ICAgTUVNIDB4YzgwMDAwMDAuLjB4Y2ZmZmZmZmYgLT4gMHhjODAw MDAwMA0KWyAgICAwLjUwMTAzNl0gcmNhci1wY2llIGVlODAwMDAwLnBjaWU6IFBDSWUgbGluayBk b3duDQpbICAgIDAuNTAyOTU5XSBFSU5KOiBBQ1BJIGRpc2FibGVkLg0KWyAgICAwLjUxNDQ1OF0g U2VyaWFsOiA4MjUwLzE2NTUwIGRyaXZlciwgNCBwb3J0cywgSVJRIHNoYXJpbmcgZW5hYmxlZA0K WyAgICAwLjUxNjI4NV0gU3VwZXJIIChIKVNDSShGKSBkcml2ZXIgaW5pdGlhbGl6ZWQNClsgICAg MC41MTY2MDhdIHNoLXNjaSBlNjU1MDAwMC5zZXJpYWw6IElSUSBpbmRleCAxIG5vdCBmb3VuZA0K WyAgICAwLjUxNjYxNl0gc2gtc2NpIGU2NTUwMDAwLnNlcmlhbDogSVJRIGluZGV4IDIgbm90IGZv dW5kDQpbICAgIDAuNTE2NjI0XSBzaC1zY2kgZTY1NTAwMDAuc2VyaWFsOiBJUlEgaW5kZXggMyBu b3QgZm91bmQNClsgICAgMC41MTY2MzBdIHNoLXNjaSBlNjU1MDAwMC5zZXJpYWw6IElSUSBpbmRl eCA0IG5vdCBmb3VuZA0KWyAgICAwLjUxNjYzN10gc2gtc2NpIGU2NTUwMDAwLnNlcmlhbDogSVJR IGluZGV4IDUgbm90IGZvdW5kDQpbICAgIDAuNTE2Njg1XSBlNjU1MDAwMC5zZXJpYWw6IHR0eVND MSBhdCBNTUlPIDB4ZTY1NTAwMDAgKGlycSA9IDM0LCBiYXNlX2JhdWQgPSAwKSBpcyBhIGhzY2lm DQpbICAgIDAuNTE3MTEyXSBzaC1zY2kgZTZlODgwMDAuc2VyaWFsOiBJUlEgaW5kZXggMSBub3Qg Zm91bmQNClsgICAgMC41MTcxMjFdIHNoLXNjaSBlNmU4ODAwMC5zZXJpYWw6IElSUSBpbmRleCAy IG5vdCBmb3VuZA0KWyAgICAwLjUxNzEyOF0gc2gtc2NpIGU2ZTg4MDAwLnNlcmlhbDogSVJRIGlu ZGV4IDMgbm90IGZvdW5kDQpbICAgIDAuNTE3MTM0XSBzaC1zY2kgZTZlODgwMDAuc2VyaWFsOiBJ UlEgaW5kZXggNCBub3QgZm91bmQNClsgICAgMC41MTcxNDBdIHNoLXNjaSBlNmU4ODAwMC5zZXJp YWw6IElSUSBpbmRleCA1IG5vdCBmb3VuZA0KWyAgICAwLjUxNzE2OV0gZTZlODgwMDAuc2VyaWFs OiB0dHlTQzAgYXQgTU1JTyAweGU2ZTg4MDAwIChpcnEgPSAxMTksIGJhc2VfYmF1ZCA9IDApIGlz IGEgc2NpZg0KWyAgICAxLjY2MTA0N10gcHJpbnRrOiBjb25zb2xlIFt0dHlTQzBdIGVuYWJsZWQN ClsgICAgMS42NjYwODRdIG1zbV9zZXJpYWw6IGRyaXZlciBpbml0aWFsaXplZA0KWyAgICAxLjY3 Njg3NF0gbG9vcDogbW9kdWxlIGxvYWRlZA0KWyAgICAxLjY4NDc4MF0gbGlicGh5OiBGaXhlZCBN RElPIEJ1czogcHJvYmVkDQpbICAgIDEuNjg5MDIzXSB0dW46IFVuaXZlcnNhbCBUVU4vVEFQIGRl dmljZSBkcml2ZXIsIDEuNg0KWyAgICAxLjY5NDg0Ml0gdGh1bmRlcl94Y3YsIHZlciAxLjANClsg ICAgMS42OTgwOTldIHRodW5kZXJfYmd4LCB2ZXIgMS4wDQpbICAgIDEuNzAxMzM2XSBuaWNwZiwg dmVyIDEuMA0KWyAgICAxLjcwNDY1N10gaGNsZ2UgaXMgaW5pdGlhbGl6aW5nDQpbICAgIDEuNzA3 OTcxXSBobnMzOiBIaXNpbGljb24gRXRoZXJuZXQgTmV0d29yayBEcml2ZXIgZm9yIEhpcDA4IEZh bWlseSAtIHZlcnNpb24NClsgICAgMS43MTUxODldIGhuczM6IENvcHlyaWdodCAoYykgMjAxNyBI dWF3ZWkgQ29ycG9yYXRpb24uDQpbICAgIDEuNzIwNTI1XSBlMTAwMGU6IEludGVsKFIpIFBSTy8x MDAwIE5ldHdvcmsgRHJpdmVyIC0gMy4yLjYtaw0KWyAgICAxLjcyNjM1NV0gZTEwMDBlOiBDb3B5 cmlnaHQoYykgMTk5OSAtIDIwMTUgSW50ZWwgQ29ycG9yYXRpb24uDQpbICAgIDEuNzMyMjkyXSBp Z2I6IEludGVsKFIpIEdpZ2FiaXQgRXRoZXJuZXQgTmV0d29yayBEcml2ZXIgLSB2ZXJzaW9uIDUu Ni4wLWsNClsgICAgMS43MzkyNTBdIGlnYjogQ29weXJpZ2h0IChjKSAyMDA3LTIwMTQgSW50ZWwg Q29ycG9yYXRpb24uDQpbICAgIDEuNzQ0ODM2XSBpZ2J2ZjogSW50ZWwoUikgR2lnYWJpdCBWaXJ0 dWFsIEZ1bmN0aW9uIE5ldHdvcmsgRHJpdmVyIC0gdmVyc2lvbiAyLjQuMC1rDQpbICAgIDEuNzUy NjYyXSBpZ2J2ZjogQ29weXJpZ2h0IChjKSAyMDA5IC0gMjAxMiBJbnRlbCBDb3Jwb3JhdGlvbi4N ClsgICAgMS43NTg5MTBdIHNreTI6IGRyaXZlciB2ZXJzaW9uIDEuMzANClsgICAgMS43NjM1MzBd IFZGSU8gLSBVc2VyIExldmVsIG1ldGEtZHJpdmVyIHZlcnNpb246IDAuMw0KWyAgICAxLjc3MDA2 N10gZWhjaV9oY2Q6IFVTQiAyLjAgJ0VuaGFuY2VkJyBIb3N0IENvbnRyb2xsZXIgKEVIQ0kpIERy aXZlcg0KWyAgICAxLjc3NjU5Nl0gZWhjaS1wY2k6IEVIQ0kgUENJIHBsYXRmb3JtIGRyaXZlcg0K WyAgICAxLjc4MTA1MF0gZWhjaS1wbGF0Zm9ybTogRUhDSSBnZW5lcmljIHBsYXRmb3JtIGRyaXZl cg0KWyAgICAxLjc4NjYwOV0gZWhjaS1wbGF0Zm9ybSBlZTBhMDEwMC51c2I6IEVIQ0kgSG9zdCBD b250cm9sbGVyDQpbICAgIDEuNzkyMjg3XSBlaGNpLXBsYXRmb3JtIGVlMGEwMTAwLnVzYjogbmV3 IFVTQiBidXMgcmVnaXN0ZXJlZCwgYXNzaWduZWQgYnVzIG51bWJlciAxDQpbICAgIDEuODAwMjAw XSBlaGNpLXBsYXRmb3JtIGVlMGEwMTAwLnVzYjogaXJxIDE2NSwgaW8gbWVtIDB4ZWUwYTAxMDAN ClsgICAgMS44MjE1NjhdIGVoY2ktcGxhdGZvcm0gZWUwYTAxMDAudXNiOiBVU0IgMi4wIHN0YXJ0 ZWQsIEVIQ0kgMS4xMA0KWyAgICAxLjgyODA4N10gaHViIDEtMDoxLjA6IFVTQiBodWIgZm91bmQN ClsgICAgMS44MzE4NTZdIGh1YiAxLTA6MS4wOiAxIHBvcnQgZGV0ZWN0ZWQNClsgICAgMS44MzYw NDRdIGVoY2ktcGxhdGZvcm0gZWUwYzAxMDAudXNiOiBFSENJIEhvc3QgQ29udHJvbGxlcg0KWyAg ICAxLjg0MTcxMV0gZWhjaS1wbGF0Zm9ybSBlZTBjMDEwMC51c2I6IG5ldyBVU0IgYnVzIHJlZ2lz dGVyZWQsIGFzc2lnbmVkIGJ1cyBudW1iZXIgMg0KWyAgICAxLjg0OTU5Ml0gZWhjaS1wbGF0Zm9y bSBlZTBjMDEwMC51c2I6IGlycSAxNjYsIGlvIG1lbSAweGVlMGMwMTAwDQpbICAgIDEuODY5NTU1 XSBlaGNpLXBsYXRmb3JtIGVlMGMwMTAwLnVzYjogVVNCIDIuMCBzdGFydGVkLCBFSENJIDEuMTAN ClsgICAgMS44NzU5OTNdIGh1YiAyLTA6MS4wOiBVU0IgaHViIGZvdW5kDQpbICAgIDEuODc5NzU3 XSBodWIgMi0wOjEuMDogMSBwb3J0IGRldGVjdGVkDQpbICAgIDEuODgzOTEwXSBlaGNpLW9yaW9u OiBFSENJIG9yaW9uIGRyaXZlcg0KWyAgICAxLjg4ODA5OF0gZWhjaS1leHlub3M6IEVIQ0kgRVhZ Tk9TIGRyaXZlcg0KWyAgICAxLjg5MjM3MV0gb2hjaV9oY2Q6IFVTQiAxLjEgJ09wZW4nIEhvc3Qg Q29udHJvbGxlciAoT0hDSSkgRHJpdmVyDQpbICAgIDEuODk4NTYyXSBvaGNpLXBjaTogT0hDSSBQ Q0kgcGxhdGZvcm0gZHJpdmVyDQpbICAgIDEuOTAzMDMzXSBvaGNpLXBsYXRmb3JtOiBPSENJIGdl bmVyaWMgcGxhdGZvcm0gZHJpdmVyDQpbICAgIDEuOTA4NDk2XSBvaGNpLXBsYXRmb3JtIGVlMGEw MDAwLnVzYjogR2VuZXJpYyBQbGF0Zm9ybSBPSENJIGNvbnRyb2xsZXINClsgICAgMS45MTUyMDld IG9oY2ktcGxhdGZvcm0gZWUwYTAwMDAudXNiOiBuZXcgVVNCIGJ1cyByZWdpc3RlcmVkLCBhc3Np Z25lZCBidXMgbnVtYmVyIDMNClsgICAgMS45MjMwNzJdIG9oY2ktcGxhdGZvcm0gZWUwYTAwMDAu dXNiOiBpcnEgMTY1LCBpbyBtZW0gMHhlZTBhMDAwMA0KWyAgICAyLjAxNjUzNF0gaHViIDMtMDox LjA6IFVTQiBodWIgZm91bmQNClsgICAgMi4wMjAyOThdIGh1YiAzLTA6MS4wOiAxIHBvcnQgZGV0 ZWN0ZWQNClsgICAgMi4wMjQ0MzhdIG9oY2ktcGxhdGZvcm0gZWUwYzAwMDAudXNiOiBHZW5lcmlj IFBsYXRmb3JtIE9IQ0kgY29udHJvbGxlcg0KWyAgICAyLjAzMTE0N10gb2hjaS1wbGF0Zm9ybSBl ZTBjMDAwMC51c2I6IG5ldyBVU0IgYnVzIHJlZ2lzdGVyZWQsIGFzc2lnbmVkIGJ1cyBudW1iZXIg NA0KWyAgICAyLjAzOTAyNl0gb2hjaS1wbGF0Zm9ybSBlZTBjMDAwMC51c2I6IGlycSAxNjYsIGlv IG1lbSAweGVlMGMwMDAwDQpbICAgIDIuMTMyNTE5XSBodWIgNC0wOjEuMDogVVNCIGh1YiBmb3Vu ZA0KWyAgICAyLjEzNjI4MV0gaHViIDQtMDoxLjA6IDEgcG9ydCBkZXRlY3RlZA0KWyAgICAyLjE0 MDQxN10gb2hjaS1leHlub3M6IE9IQ0kgRVhZTk9TIGRyaXZlcg0KWyAgICAyLjE0NTExMF0geGhj aS1oY2QgZWUwMDAwMDAudXNiOiB4SENJIEhvc3QgQ29udHJvbGxlcg0KWyAgICAyLjE1MDM0NF0g eGhjaS1oY2QgZWUwMDAwMDAudXNiOiBuZXcgVVNCIGJ1cyByZWdpc3RlcmVkLCBhc3NpZ25lZCBi dXMgbnVtYmVyIDUNClsgICAgMi4xNTc3ODJdIHhoY2ktaGNkIGVlMDAwMDAwLnVzYjogRGlyZWN0 IGZpcm13YXJlIGxvYWQgZm9yIHI4YTc3OXhfdXNiM192My5kbG1lbSBmYWlsZWQgd2l0aCBlcnJv ciAtMg0KWyAgICAyLjE2NzA5OF0geGhjaS1oY2QgZWUwMDAwMDAudXNiOiBjYW4ndCBzZXR1cDog LTINClsgICAgMi4xNzE4OTVdIHhoY2ktaGNkIGVlMDAwMDAwLnVzYjogVVNCIGJ1cyA1IGRlcmVn aXN0ZXJlZA0KWyAgICAyLjE3NzMyNF0geGhjaS1oY2Q6IHByb2JlIG9mIGVlMDAwMDAwLnVzYiBm YWlsZWQgd2l0aCBlcnJvciAtMg0KWyAgICAyLjE4MzY1NV0gdXNiY29yZTogcmVnaXN0ZXJlZCBu ZXcgaW50ZXJmYWNlIGRyaXZlciB1c2Itc3RvcmFnZQ0KWyAgICAyLjE5MjQxN10gaTJjIC9kZXYg ZW50cmllcyBkcml2ZXINClsgICAgMi4yMDM4MjRdIGNzMjAwMC1jcCAyLTAwNGY6IHJldmlzaW9u IC0gQzENClsgICAgMi4yMDgwNTFdIGkyYy1yY2FyIGU2NTEwMDAwLmkyYzogcHJvYmVkDQpbICAg IDIuMjEyMzk3XSBwY2E5NTN4IDQtMDAyMDogNC0wMDIwIHN1cHBseSB2Y2Mgbm90IGZvdW5kLCB1 c2luZyBkdW1teSByZWd1bGF0b3INClsgICAgMi4yMjAzOTldIGkyYy1yY2FyIGU2NmQ4MDAwLmky YzogcHJvYmVkDQpbICAgIDIuMjMxMDIyXSByY2FyX2dlbjNfdGhlcm1hbCBlNjE5ODAwMC50aGVy bWFsOiBUU0MwOiBMb2FkZWQgMSB0cmlwIHBvaW50cw0KWyAgICAyLjI0MjA0OV0gcmNhcl9nZW4z X3RoZXJtYWwgZTYxOTgwMDAudGhlcm1hbDogVFNDMTogTG9hZGVkIDEgdHJpcCBwb2ludHMNClsg ICAgMi4yNTMwNTFdIHJjYXJfZ2VuM190aGVybWFsIGU2MTk4MDAwLnRoZXJtYWw6IFRTQzI6IExv YWRlZCAyIHRyaXAgcG9pbnRzDQpbICAgIDIuMjYyNTI1XSBjcHVmcmVxOiBjcHVmcmVxX29ubGlu ZTogQ1BVMDogUnVubmluZyBhdCB1bmxpc3RlZCBmcmVxOiAxNDk5OTk5IEtIeg0KWyAgICAyLjI2 OTk1NF0gY3B1ZnJlcTogY3B1ZnJlcV9vbmxpbmU6IENQVTA6IFVubGlzdGVkIGluaXRpYWwgZnJl cXVlbmN5IGNoYW5nZWQgdG86IDE1MDAwMDAgS0h6DQpbICAgIDIuMjc4ODQyXSBjcHVmcmVxOiBj cHVmcmVxX29ubGluZTogQ1BVNDogUnVubmluZyBhdCB1bmxpc3RlZCBmcmVxOiAxMTk5OTk5IEtI eg0KWyAgICAyLjI4NjUzN10gY3B1ZnJlcTogY3B1ZnJlcV9vbmxpbmU6IENQVTQ6IFVubGlzdGVk IGluaXRpYWwgZnJlcXVlbmN5IGNoYW5nZWQgdG86IDEyMDAwMDAgS0h6DQpbICAgIDIuMjk1ODY0 XSBzZGhjaTogU2VjdXJlIERpZ2l0YWwgSG9zdCBDb250cm9sbGVyIEludGVyZmFjZSBkcml2ZXIN ClsgICAgMi4zMDIwNDhdIHNkaGNpOiBDb3B5cmlnaHQoYykgUGllcnJlIE9zc21hbg0KWyAgICAy LjMwNzAyMV0gcmVuZXNhc19zZGhpX2ludGVybmFsX2RtYWMgZWUxMDAwMDAuc2Q6IEdvdCBDRCBH UElPDQpbICAgIDIuMzEyOTU5XSByZW5lc2FzX3NkaGlfaW50ZXJuYWxfZG1hYyBlZTEwMDAwMC5z ZDogR290IFdQIEdQSU8NClsgICAgMi4zODk4NThdIHJlbmVzYXNfc2RoaV9pbnRlcm5hbF9kbWFj IGVlMTQwMDAwLnNkOiBJUlEgaW5kZXggMSBub3QgZm91bmQNClsgICAgMi4zOTY2NTNdIHJlbmVz YXNfc2RoaV9pbnRlcm5hbF9kbWFjIGVlMTQwMDAwLnNkOiBtbWMwIGJhc2UgYXQgMHhlZTE0MDAw MCBtYXggY2xvY2sgcmF0ZSAyMDAgTUh6DQpbICAgIDIuNDA1OTY0XSByZW5lc2FzX3NkaGlfaW50 ZXJuYWxfZG1hYyBlZTE2MDAwMC5zZDogR290IENEIEdQSU8NClsgICAgMi40MTE5MDRdIHJlbmVz YXNfc2RoaV9pbnRlcm5hbF9kbWFjIGVlMTYwMDAwLnNkOiBHb3QgV1AgR1BJTw0KWyAgICAyLjQx ODIxMV0gU3lub3BzeXMgRGVzaWdud2FyZSBNdWx0aW1lZGlhIENhcmQgSW50ZXJmYWNlIERyaXZl cg0KWyAgICAyLjQyNTE3OF0gc2RoY2ktcGx0Zm06IFNESENJIHBsYXRmb3JtIGFuZCBPRiBkcml2 ZXIgaGVscGVyDQpbICAgIDIuNDMyNDk0XSBsZWR0cmlnLWNwdTogcmVnaXN0ZXJlZCB0byBpbmRp Y2F0ZSBhY3Rpdml0eSBvbiBDUFVzDQpbICAgIDIuNDM5NTcxXSB1c2Jjb3JlOiByZWdpc3RlcmVk IG5ldyBpbnRlcmZhY2UgZHJpdmVyIHVzYmhpZA0KWyAgICAyLjQ0NTE0NF0gdXNiaGlkOiBVU0Ig SElEIGNvcmUgZHJpdmVyDQpbICAgIDIuNDUyNDUzXSBORVQ6IFJlZ2lzdGVyZWQgcHJvdG9jb2wg ZmFtaWx5IDE3DQpbICAgIDIuNDU3MDExXSA5cG5ldDogSW5zdGFsbGluZyA5UDIwMDAgc3VwcG9y dA0KWyAgICAyLjQ2MTMxOV0gS2V5IHR5cGUgZG5zX3Jlc29sdmVyIHJlZ2lzdGVyZWQNClsgICAg Mi40NjU3OTldIHJlZ2lzdGVyZWQgdGFza3N0YXRzIHZlcnNpb24gMQ0KWyAgICAyLjQ2OTg5OF0g TG9hZGluZyBjb21waWxlZC1pbiBYLjUwOSBjZXJ0aWZpY2F0ZXMNClsgICAgMi40ODI4OTddIHJl bmVzYXNfaXJxYyBlNjFjMDAwMC5pbnRlcnJ1cHQtY29udHJvbGxlcjogZHJpdmluZyA2IGlycXMN ClsgICAgMi40OTU2MDBdIGJkOTU3MW13diA3LTAwMzA6IERldmljZTogQkQ5NTcxTVdWIHJldi4g MQ0KWyAgICAyLjUxNTAzMV0gbW1jMDogbmV3IEhTNDAwIE1NQyBjYXJkIGF0IGFkZHJlc3MgMDAw MQ0KWyAgICAyLjUyMDQxOF0gbW1jYmxrMDogbW1jMDowMDAxIEJHU0QzUiAyOS4xIEdpQiANClsg ICAgMi41MjUxMzFdIG1tY2JsazBib290MDogbW1jMDowMDAxIEJHU0QzUiBwYXJ0aXRpb24gMSAx Ni4wIE1pQg0KWyAgICAyLjUyOTU3NV0gZWhjaS1wbGF0Zm9ybSBlZTA4MDEwMC51c2I6IEVIQ0kg SG9zdCBDb250cm9sbGVyDQpbICAgIDIuNTMxMjA3XSBtbWNibGswYm9vdDE6IG1tYzA6MDAwMSBC R1NEM1IgcGFydGl0aW9uIDIgMTYuMCBNaUINClsgICAgMi41MzY3MThdIGVoY2ktcGxhdGZvcm0g ZWUwODAxMDAudXNiOiBuZXcgVVNCIGJ1cyByZWdpc3RlcmVkLCBhc3NpZ25lZCBidXMgbnVtYmVy IDUNClsgICAgMi41NDI3MzRdIG1tY2JsazBycG1iOiBtbWMwOjAwMDEgQkdTRDNSIHBhcnRpdGlv biAzIDQuMDAgTWlCLCBjaGFyZGV2ICgyMzc6MCkNClsgICAgMi41NTA0OTldIGVoY2ktcGxhdGZv cm0gZWUwODAxMDAudXNiOiBpcnEgMTY0LCBpbyBtZW0gMHhlZTA4MDEwMA0KWyAgICAyLjU1ODM1 N10gIG1tY2JsazA6IHAxDQpbICAgIDIuNTc3NTYwXSBlaGNpLXBsYXRmb3JtIGVlMDgwMTAwLnVz YjogVVNCIDIuMCBzdGFydGVkLCBFSENJIDEuMTANClsgICAgMi41ODQwODRdIGh1YiA1LTA6MS4w OiBVU0IgaHViIGZvdW5kDQpbICAgIDIuNTg3ODUxXSBodWIgNS0wOjEuMDogMSBwb3J0IGRldGVj dGVkDQpbICAgIDIuNTkyODQ5XSBvaGNpLXBsYXRmb3JtIGVlMDgwMDAwLnVzYjogR2VuZXJpYyBQ bGF0Zm9ybSBPSENJIGNvbnRyb2xsZXINClsgICAgMi41OTk1NjldIG9oY2ktcGxhdGZvcm0gZWUw ODAwMDAudXNiOiBuZXcgVVNCIGJ1cyByZWdpc3RlcmVkLCBhc3NpZ25lZCBidXMgbnVtYmVyIDYN ClsgICAgMi42MDc0NDZdIG9oY2ktcGxhdGZvcm0gZWUwODAwMDAudXNiOiBpcnEgMTY0LCBpbyBt ZW0gMHhlZTA4MDAwMA0KWyAgICAyLjcwNDUyOF0gaHViIDYtMDoxLjA6IFVTQiBodWIgZm91bmQN ClsgICAgMi43MDgyOTVdIGh1YiA2LTA6MS4wOiAxIHBvcnQgZGV0ZWN0ZWQNClsgICAgMi43MTMz NDJdIHJlbmVzYXNfc2RoaV9pbnRlcm5hbF9kbWFjIGVlMTAwMDAwLnNkOiBHb3QgQ0QgR1BJTw0K WyAgICAyLjcxOTI4M10gcmVuZXNhc19zZGhpX2ludGVybmFsX2RtYWMgZWUxMDAwMDAuc2Q6IEdv dCBXUCBHUElPDQpbICAgIDIuNzk1NzEzXSByZW5lc2FzX3NkaGlfaW50ZXJuYWxfZG1hYyBlZTEw MDAwMC5zZDogSVJRIGluZGV4IDEgbm90IGZvdW5kDQpbICAgIDIuODAyNTA5XSByZW5lc2FzX3Nk aGlfaW50ZXJuYWxfZG1hYyBlZTEwMDAwMC5zZDogbW1jMSBiYXNlIGF0IDB4ZWUxMDAwMDAgbWF4 IGNsb2NrIHJhdGUgMjAwIE1Ieg0KWyAgICAyLjgxMjM4OV0gcmVuZXNhc19zZGhpX2ludGVybmFs X2RtYWMgZWUxNjAwMDAuc2Q6IEdvdCBDRCBHUElPDQpbICAgIDIuODE4MzM3XSByZW5lc2FzX3Nk aGlfaW50ZXJuYWxfZG1hYyBlZTE2MDAwMC5zZDogR290IFdQIEdQSU8NClsgICAgMi44OTQ2ODNd IHJlbmVzYXNfc2RoaV9pbnRlcm5hbF9kbWFjIGVlMTYwMDAwLnNkOiBJUlEgaW5kZXggMSBub3Qg Zm91bmQNClsgICAgMi45MDE0NzddIHJlbmVzYXNfc2RoaV9pbnRlcm5hbF9kbWFjIGVlMTYwMDAw LnNkOiBtbWMyIGJhc2UgYXQgMHhlZTE2MDAwMCBtYXggY2xvY2sgcmF0ZSAyMDAgTUh6DQpbICAg IDIuOTE0MDk2XSByY2FyLWRtYWMgZTY3MDAwMDAuZG1hLWNvbnRyb2xsZXI6IGlnbm9yaW5nIGRl cGVuZGVuY3kgZm9yIGRldmljZSwgYXNzdW1pbmcgbm8gZHJpdmVyDQpbICAgIDIuOTI1MDAxXSBy Y2FyLWRtYWMgZTczMDAwMDAuZG1hLWNvbnRyb2xsZXI6IGlnbm9yaW5nIGRlcGVuZGVuY3kgZm9y IGRldmljZSwgYXNzdW1pbmcgbm8gZHJpdmVyDQpbICAgIDIuOTM1Nzg4XSByY2FyLWRtYWMgZTcz MTAwMDAuZG1hLWNvbnRyb2xsZXI6IGlnbm9yaW5nIGRlcGVuZGVuY3kgZm9yIGRldmljZSwgYXNz dW1pbmcgbm8gZHJpdmVyDQpbICAgIDIuOTQ2NjIxXSByY2FyLWRtYWMgZWM3MDAwMDAuZG1hLWNv bnRyb2xsZXI6IGlnbm9yaW5nIGRlcGVuZGVuY3kgZm9yIGRldmljZSwgYXNzdW1pbmcgbm8gZHJp dmVyDQpbICAgIDIuOTU3NDEzXSByY2FyLWRtYWMgZWM3MjAwMDAuZG1hLWNvbnRyb2xsZXI6IGln bm9yaW5nIGRlcGVuZGVuY3kgZm9yIGRldmljZSwgYXNzdW1pbmcgbm8gZHJpdmVyDQpbICAgIDIu OTY4NDI2XSBzYXRhX3JjYXIgZWUzMDAwMDAuc2F0YTogaWdub3JpbmcgZGVwZW5kZW5jeSBmb3Ig ZGV2aWNlLCBhc3N1bWluZyBubyBkcml2ZXINClsgICAgMi45NzY4NzVdIHNjc2kgaG9zdDA6IHNh dGFfcmNhcg0KWyAgICAyLjk4MDM0OF0gYXRhMTogU0FUQSBtYXggVURNQS8xMzMgaXJxIDE3MA0K WyAgICAyLjk4NTI5OV0gcmF2YiBlNjgwMDAwMC5ldGhlcm5ldDogaWdub3JpbmcgZGVwZW5kZW5j eSBmb3IgZGV2aWNlLCBhc3N1bWluZyBubyBkcml2ZXINClsgICAgMi45OTM1MTJdIGxpYnBoeTog cmF2Yl9taWk6IHByb2JlZA0KWyAgICAyLjk5ODI3OF0gcmF2YiBlNjgwMDAwMC5ldGhlcm5ldCBl dGgwOiBCYXNlIGFkZHJlc3MgYXQgMHhlNjgwMDAwMCwgMmU6MDk6MGE6MDA6ODM6ZWEsIElSUSAx MTYuDQpbICAgIDMuMDA4NjI0XSBpbnB1dDoga2V5cyBhcyAvZGV2aWNlcy9wbGF0Zm9ybS9rZXlz L2lucHV0L2lucHV0MA0KWyAgICAzLjAxNDcxM10gaGN0b3N5czogdW5hYmxlIHRvIG9wZW4gcnRj IGRldmljZSAocnRjMCkNClsgICAgMy4wOTY1MTBdIE1pY3JlbCBLU1o5MDMxIEdpZ2FiaXQgUEhZ IGU2ODAwMDAwLmV0aGVybmV0LWZmZmZmZmZmOjAwOiBhdHRhY2hlZCBQSFkgZHJpdmVyIFtNaWNy ZWwgS1NaOTAzMSBHaWdhYml0IFBIWV0gKG1paV9idXM6cGh5X2FkZHI9ZTY4MDAwMDAuZXRoZXJu ZXQtZmZmZmZmZmY6MDAsIGlycT0xNzUpDQpbICAgIDMuNDAxNTY0XSBhdGExOiBsaW5rIHJlc3Vt ZSBzdWNjZWVkZWQgYWZ0ZXIgMSByZXRyaWVzDQpbICAgIDMuNTEzMDcyXSBhdGExOiBTQVRBIGxp bmsgZG93biAoU1N0YXR1cyAwIFNDb250cm9sIDMwMCkNClsgICAgNC43NDIwNTldIHJhdmIgZTY4 MDAwMDAuZXRoZXJuZXQgZXRoMDogTGluayBpcyBVcCAtIDEwME1icHMvRnVsbCAtIGZsb3cgY29u dHJvbCBvZmYNClsgICAgNC43NzM1NTNdIFNlbmRpbmcgREhDUCByZXF1ZXN0cyAuLiwNClsgICAg Ny40MTM5NzVdIHJhbmRvbTogZmFzdCBpbml0IGRvbmUNClsgICAgNy40MjE1NTBdICBPSw0KWyAg ICA3LjQyMzMyMF0gSVAtQ29uZmlnOiBHb3QgREhDUCBhbnN3ZXIgZnJvbSAxOTIuMTY4LjQ0Ljc0 LCBteSBhZGRyZXNzIGlzIDE5Mi4xNjguNDQuMTA0DQpbICAgIDcuNDMxMzM2XSBJUC1Db25maWc6 IENvbXBsZXRlOg0KWyAgICA3LjQzNDU2OF0gICAgICBkZXZpY2U9ZXRoMCwgaHdhZGRyPTJlOjA5 OjBhOjAwOjgzOmVhLCBpcGFkZHI9MTkyLjE2OC40NC4xMDQsIG1hc2s9MjU1LjI1NS4yNTUuMCwg Z3c9MTkyLjE2OC40NC43NA0KWyAgICA3LjQ0NTAwMF0gICAgICBob3N0PTE5Mi4xNjguNDQuMTA0 LCBkb21haW49c2hpbW9kYS1pNy5vcmcsIG5pcy1kb21haW49KG5vbmUpDQpbICAgIDcuNDUyMjE4 XSAgICAgIGJvb3RzZXJ2ZXI9MTkyLjE2OC40NC43NCwgcm9vdHNlcnZlcj0xOTIuMTY4LjQ0Ljc0 LCByb290cGF0aD0vdmFyL2xpYi90ZnRwYm9vdC9hYXJjaDY0L3Jvb3Rmcy9idWlsZHJvb3QNClsg ICAgNy40NTIyMjBdICAgICAgbmFtZXNlcnZlcjA9MTkyLjE2OC40NC43NA0KWyAgICA3LjQ2NzU1 M10gU0RISTAgVmNjOiBkaXNhYmxpbmcNClsgICAgNy40NzA3ODJdIFNESEkzIFZjYzogZGlzYWJs aW5nDQpbICAgIDcuNDc0MDA4XSBTREhJMCBWY2NROiBkaXNhYmxpbmcNClsgICAgNy40NzczMTZd IFNESEkzIFZjY1E6IGRpc2FibGluZw0KWyAgICA3LjQ4MDYzMl0gQUxTQSBkZXZpY2UgbGlzdDoN ClsgICAgNy40ODM1OThdICAgTm8gc291bmRjYXJkcyBmb3VuZC4NClsgICAgNy40OTI0OTZdIFZG UzogTW91bnRlZCByb290IChuZnMgZmlsZXN5c3RlbSkgb24gZGV2aWNlIDA6MTkuDQpbICAgIDcu NDk4NzQyXSBkZXZ0bXBmczogbW91bnRlZA0KWyAgICA3LjUwNDI2M10gRnJlZWluZyB1bnVzZWQg a2VybmVsIG1lbW9yeTogNDk5MksNClsgICAgNy41MTM2NDJdIFJ1biAvc2Jpbi9pbml0IGFzIGlu aXQgcHJvY2Vzcw0KWyAgICA3Ljg0Mzg3MV0gVW5hYmxlIHRvIGhhbmRsZSBrZXJuZWwgcGFnaW5n IHJlcXVlc3QgYXQgdmlydHVhbCBhZGRyZXNzIDAwMDAwMDAwNTYwMDAwMDANClsgICAgNy44NTE3 OTddIE1lbSBhYm9ydCBpbmZvOg0KWyAgICA3Ljg1NDU4OV0gICBFU1IgPSAweDk2MDAwMDA0DQpb ICAgIDcuODU3NjQyXSAgIEVDID0gMHgyNTogREFCVCAoY3VycmVudCBFTCksIElMID0gMzIgYml0 cw0KWyAgICA3Ljg2Mjk1MF0gICBTRVQgPSAwLCBGblYgPSAwDQpbICAgIDcuODY2MDAxXSAgIEVB ID0gMCwgUzFQVFcgPSAwDQpbICAgIDcuODY5MTM0XSBEYXRhIGFib3J0IGluZm86DQpbICAgIDcu ODcyMDExXSAgIElTViA9IDAsIElTUyA9IDB4MDAwMDAwMDQNClsgICAgNy44NzU4NDJdICAgQ00g PSAwLCBXblIgPSAwDQpbICAgIDcuODc4ODA2XSB1c2VyIHBndGFibGU6IDRrIHBhZ2VzLCA0OC1i aXQgVkFzLCBwZ2RwPTAwMDAwMDA3NzQ3ODcwMDANClsgICAgNy44ODUyNDJdIFswMDAwMDAwMDU2 MDAwMDAwXSBwZ2Q9MDAwMDAwMDAwMDAwMDAwMA0KWyAgICA3Ljg5MDExOV0gSW50ZXJuYWwgZXJy b3I6IE9vcHM6IDk2MDAwMDA0IFsjMV0gUFJFRU1QVCBTTVANClsgICAgNy44OTU2ODRdIE1vZHVs ZXMgbGlua2VkIGluOg0KWyAgICA3Ljg5ODczN10gQ1BVOiAyIFBJRDogMSBDb21tOiBzeXN0ZW1k IE5vdCB0YWludGVkIDUuMy4wLXJjNi1uZXh0LTIwMTkwOTAyLTAwMDAxLWc5NzA5NDY4ICM0OA0K WyAgICA3LjkwNzM0MF0gSGFyZHdhcmUgbmFtZTogUmVuZXNhcyBTYWx2YXRvci1YIGJvYXJkIGJh c2VkIG9uIHI4YTc3OTUgRVMyLjArIChEVCkNClsgICAgNy45MTQ3MjldIHBzdGF0ZTogMjAwMDAw MDUgKG56Q3YgZGFpZiAtUEFOIC1VQU8pDQpbICAgIDcuOTE5NTIzXSBwYyA6IGRwdXQrMHgzOC8w eDJlOA0KWyAgICA3LjkyMjc0M10gbHIgOiBkcHV0KzB4MzQvMHgyZTgNClsgICAgNy45MjU5NjRd IHNwIDogZmZmZjgwMDAxMDA2YmJhMA0KWyAgICA3LjkyOTI3MF0geDI5OiBmZmZmODAwMDEwMDZi YmEwIHgyODogZmZmZjAwMDczNWM5ODAwMCANClsgICAgNy45MzQ1NzZdIHgyNzogMDAwMDAwMDAw MDAwMDAwMCB4MjY6IDAwMDAwMDAwMDAwMDAwMDAgDQpbICAgIDcuOTM5ODgxXSB4MjU6IDAwMDAw MDAwNTYwMDAwMDAgeDI0OiAwMDAwMDAwMDAwMDA0MDAwIA0KWyAgICA3Ljk0NTE4Nl0geDIzOiAw MDAwMDAwMDAwMDAwMDAxIHgyMjogMDAwMDAwMDAwMDA4MDA2MCANClsgICAgNy45NTA0OTFdIHgy MTogMDAwMDAwMDAwMDA4MDA0MCB4MjA6IDAwMDAwMDAwNTYwMDAwNTggDQpbICAgIDcuOTU1Nzk1 XSB4MTk6IDAwMDAwMDAwNTYwMDAwMDAgeDE4OiAwMDAwMDAwMDAwMDAwMDAwIA0KWyAgICA3Ljk2 MTA5OV0geDE3OiAwMDAwMDAwMDAwMDAwMDAwIHgxNjogMDAwMDAwMDAwMDAwMDAwMCANClsgICAg Ny45NjY0MDNdIHgxNTogMDAwMDAwMDAwMDAwMDAwMCB4MTQ6IDAwMDAwMDAwMDAwMDAwMDAgDQpb ICAgIDcuOTcxNzA3XSB4MTM6IDAwMDAwMDAwMDAwMDAwMDAgeDEyOiBmZWZlZmVmZWZlZmVmZWZm IA0KWyAgICA3Ljk3NzAxMV0geDExOiAwMDAwZmZmZmEwMTAxOGI4IHgxMDogMDAwMGZmZmZhMDEw MThiOCANClsgICAgNy45ODIzMTVdIHg5IDogNmJmZjNhM2EzNzVjMTlmZiB4OCA6IDAwZmZmZmEw MTAxOGI4MDAgDQpbICAgIDcuOTg3NjIwXSB4NyA6IDAwMDAwMDAwMDAwMDAwMDAgeDYgOiAwMDAw MDAwMDAwMDAwMDAwIA0KWyAgICA3Ljk5MjkyNF0geDUgOiAwMDAwMDAwMDAwMDAwMDY0IHg0IDog MDAwMDAwMGMwMDAwMDAwMCANClsgICAgNy45OTgyMjhdIHgzIDogMDAwMDAwMDAwMDAwMDAwMSB4 MiA6IDAwMDAwMDAwMDAwMDAwODIgDQpbICAgIDguMDAzNTMyXSB4MSA6IGZmZmYwMDA3MzVjOTgw MDAgeDAgOiAwMDAwMDAwMDAwMDAwMDAxIA0KWyAgICA4LjAwODgzOF0gQ2FsbCB0cmFjZToNClsg ICAgOC4wMTEyNzhdICBkcHV0KzB4MzgvMHgyZTgNClsgICAgOC4wMTQxNTVdICB0ZXJtaW5hdGVf d2FsaysweGY0LzB4MTIwDQpbICAgIDguMDE3ODk3XSAgcGF0aF9sb29rdXBhdCsweGY4LzB4MWY4 DQpbICAgIDguMDIxNTUzXSAgZmlsZW5hbWVfbG9va3VwKzB4OGMvMHgxNjANClsgICAgOC4wMjUz ODJdICB1c2VyX3BhdGhfYXRfZW1wdHkrMHg0OC8weDU4DQpbICAgIDguMDI5Mzg3XSAgX19hcm02 NF9zeXNfbmFtZV90b19oYW5kbGVfYXQrMHg2NC8weDJkMA0KWyAgICA4LjAzNDQzNV0gIGVsMF9z dmNfY29tbW9uKzB4NjgvMHgxNzgNClsgICAgOC4wMzgxNzddICBlbDBfc3ZjX2hhbmRsZXIrMHgy NC8weDk4DQpbICAgIDguMDQxOTIwXSAgZWwwX3N2YysweDgvMHhjDQpbICAgIDguMDQ0Nzk4XSBD b2RlOiA3MmEwMDExNSA1MjgwMDAzNyA5N2ZiMjZiNCA5MTAxNjI3NCAoYjk0MDAyNjApIA0KWyAg ICA4LjA1MDg5NV0gLS0tWyBlbmQgdHJhY2UgZGQwNjQ5MGVjOTgxMjgyYiBdLS0tDQpbICAgIDgu MDU1OTY2XSBLZXJuZWwgcGFuaWMgLSBub3Qgc3luY2luZzogQXR0ZW1wdGVkIHRvIGtpbGwgaW5p dCEgZXhpdGNvZGU9MHgwMDAwMDAwYg0KWyAgICA4LjA2MzYxOV0gU01QOiBzdG9wcGluZyBzZWNv bmRhcnkgQ1BVcw0KWyAgICA4LjA2NzUzOV0gS2VybmVsIE9mZnNldDogZGlzYWJsZWQNClsgICAg OC4wNzEwMjFdIENQVSBmZWF0dXJlczogMHgwMDAyLDIxMDA2MDA0DQpbICAgIDguMDc1MDIyXSBN ZW1vcnkgTGltaXQ6IG5vbmUNClsgICAgOC4wNzgwNzZdIC0tLVsgZW5kIEtlcm5lbCBwYW5pYyAt IG5vdCBzeW5jaW5nOiBBdHRlbXB0ZWQgdG8ga2lsbCBpbml0ISBleGl0Y29kZT0weDAwMDAwMDBi IF0tLS0NCg0KQmVzdCByZWdhcmRzLA0KWW9zaGloaXJvIFNoaW1vZGENCg0K ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-09-03 8:53 ` Yoshihiro Shimoda (?) @ 2019-09-03 9:37 ` Greg Kroah-Hartman -1 siblings, 0 replies; 234+ messages in thread From: Greg Kroah-Hartman @ 2019-09-03 9:37 UTC (permalink / raw) To: Yoshihiro Shimoda Cc: David Howells, viro, Casey Schaufler, Stephen Smalley, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, Sep 03, 2019 at 08:53:31AM +0000, Yoshihiro Shimoda wrote: > Hi, > > > From: David Howells, Sent: Friday, August 30, 2019 10:58 PM > <snip> > > diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > > index 9063ede411ae..b8572e4d6a1b 100644 > > --- a/drivers/usb/core/devio.c > > +++ b/drivers/usb/core/devio.c > > @@ -41,6 +41,7 @@ > > #include <linux/dma-mapping.h> > > #include <asm/byteorder.h> > > #include <linux/moduleparam.h> > > +#include <linux/watch_queue.h> > > > > #include "usb.h" > > > > @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) > > } > > } > > > > +#ifdef CONFIG_USB_NOTIFICATIONS > > +static noinline void post_usb_notification(const char *devname, > > + enum usb_notification_type subtype, > > + u32 error) > > +{ > > + unsigned int gran = WATCH_LENGTH_GRANULARITY; > > + unsigned int name_len, n_len; > > + u64 id = 0; /* Might want to put a dev# here. */ > > + > > + struct { > > + struct usb_notification n; > > + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - > > + (sizeof(struct usb_notification) - > > + offsetof(struct usb_notification, name))]; > > + } n; > > + > > + name_len = strlen(devname); > > + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); > > + n_len = round_up(offsetof(struct usb_notification, name) + name_len, > > + gran) / gran; > > + > > + memset(&n, 0, sizeof(n)); > > + memcpy(n.n.name, devname, n_len); > > + > > + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; > > + n.n.watch.subtype = subtype; > > + n.n.watch.info = n_len; > > + n.n.error = error; > > + n.n.name_len = name_len; > > + > > + post_device_notification(&n.n.watch, id); > > +} > > + > > +void post_usb_device_notification(const struct usb_device *udev, > > + enum usb_notification_type subtype, u32 error) > > +{ > > + post_usb_notification(dev_name(&udev->dev), subtype, error); > > +} > > + > > +void post_usb_bus_notification(const struct usb_bus *ubus, > > This function's argument is struct usb_bus *, but ... > > > + enum usb_notification_type subtype, u32 error) > > +{ > > + post_usb_notification(ubus->bus_name, subtype, error); > > +} > > +#endif > > + > > static int usbdev_notify(struct notifier_block *self, > > unsigned long action, void *dev) > > { > > switch (action) { > > case USB_DEVICE_ADD: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > > break; > > case USB_DEVICE_REMOVE: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > > + usbdev_remove(dev); > > + break; > > + case USB_BUS_ADD: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > > + break; > > + case USB_BUS_REMOVE: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > > usbdev_remove(dev); > > this function calls usbdev_remove() with incorrect argument if the action > is USB_BUS_REMOVE. So, this seems to cause the following issue [1] on > my environment (R-Car H3 / r8a7795 on next-20190902) [2]. However, I have > no idea how to fix the issue, so I report this issue at the first step. As a few of us just discussed this on IRC, these bus notifiers should probably be dropped as these are the incorrect structure type as you found out. Thanks for the report. greg k-h ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 9:37 ` Greg Kroah-Hartman 0 siblings, 0 replies; 234+ messages in thread From: Greg Kroah-Hartman @ 2019-09-03 9:37 UTC (permalink / raw) To: Yoshihiro Shimoda Cc: David Howells, viro, Casey Schaufler, Stephen Smalley, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, Sep 03, 2019 at 08:53:31AM +0000, Yoshihiro Shimoda wrote: > Hi, > > > From: David Howells, Sent: Friday, August 30, 2019 10:58 PM > <snip> > > diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > > index 9063ede411ae..b8572e4d6a1b 100644 > > --- a/drivers/usb/core/devio.c > > +++ b/drivers/usb/core/devio.c > > @@ -41,6 +41,7 @@ > > #include <linux/dma-mapping.h> > > #include <asm/byteorder.h> > > #include <linux/moduleparam.h> > > +#include <linux/watch_queue.h> > > > > #include "usb.h" > > > > @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) > > } > > } > > > > +#ifdef CONFIG_USB_NOTIFICATIONS > > +static noinline void post_usb_notification(const char *devname, > > + enum usb_notification_type subtype, > > + u32 error) > > +{ > > + unsigned int gran = WATCH_LENGTH_GRANULARITY; > > + unsigned int name_len, n_len; > > + u64 id = 0; /* Might want to put a dev# here. */ > > + > > + struct { > > + struct usb_notification n; > > + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - > > + (sizeof(struct usb_notification) - > > + offsetof(struct usb_notification, name))]; > > + } n; > > + > > + name_len = strlen(devname); > > + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); > > + n_len = round_up(offsetof(struct usb_notification, name) + name_len, > > + gran) / gran; > > + > > + memset(&n, 0, sizeof(n)); > > + memcpy(n.n.name, devname, n_len); > > + > > + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; > > + n.n.watch.subtype = subtype; > > + n.n.watch.info = n_len; > > + n.n.error = error; > > + n.n.name_len = name_len; > > + > > + post_device_notification(&n.n.watch, id); > > +} > > + > > +void post_usb_device_notification(const struct usb_device *udev, > > + enum usb_notification_type subtype, u32 error) > > +{ > > + post_usb_notification(dev_name(&udev->dev), subtype, error); > > +} > > + > > +void post_usb_bus_notification(const struct usb_bus *ubus, > > This function's argument is struct usb_bus *, but ... > > > + enum usb_notification_type subtype, u32 error) > > +{ > > + post_usb_notification(ubus->bus_name, subtype, error); > > +} > > +#endif > > + > > static int usbdev_notify(struct notifier_block *self, > > unsigned long action, void *dev) > > { > > switch (action) { > > case USB_DEVICE_ADD: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > > break; > > case USB_DEVICE_REMOVE: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > > + usbdev_remove(dev); > > + break; > > + case USB_BUS_ADD: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > > + break; > > + case USB_BUS_REMOVE: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > > usbdev_remove(dev); > > this function calls usbdev_remove() with incorrect argument if the action > is USB_BUS_REMOVE. So, this seems to cause the following issue [1] on > my environment (R-Car H3 / r8a7795 on next-20190902) [2]. However, I have > no idea how to fix the issue, so I report this issue at the first step. As a few of us just discussed this on IRC, these bus notifiers should probably be dropped as these are the incorrect structure type as you found out. Thanks for the report. greg k-h ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 9:37 ` Greg Kroah-Hartman 0 siblings, 0 replies; 234+ messages in thread From: Greg Kroah-Hartman @ 2019-09-03 9:37 UTC (permalink / raw) To: Yoshihiro Shimoda Cc: David Howells, viro, Casey Schaufler, Stephen Smalley, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, Sep 03, 2019 at 08:53:31AM +0000, Yoshihiro Shimoda wrote: > Hi, > > > From: David Howells, Sent: Friday, August 30, 2019 10:58 PM > <snip> > > diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > > index 9063ede411ae..b8572e4d6a1b 100644 > > --- a/drivers/usb/core/devio.c > > +++ b/drivers/usb/core/devio.c > > @@ -41,6 +41,7 @@ > > #include <linux/dma-mapping.h> > > #include <asm/byteorder.h> > > #include <linux/moduleparam.h> > > +#include <linux/watch_queue.h> > > > > #include "usb.h" > > > > @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) > > } > > } > > > > +#ifdef CONFIG_USB_NOTIFICATIONS > > +static noinline void post_usb_notification(const char *devname, > > + enum usb_notification_type subtype, > > + u32 error) > > +{ > > + unsigned int gran = WATCH_LENGTH_GRANULARITY; > > + unsigned int name_len, n_len; > > + u64 id = 0; /* Might want to put a dev# here. */ > > + > > + struct { > > + struct usb_notification n; > > + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - > > + (sizeof(struct usb_notification) - > > + offsetof(struct usb_notification, name))]; > > + } n; > > + > > + name_len = strlen(devname); > > + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); > > + n_len = round_up(offsetof(struct usb_notification, name) + name_len, > > + gran) / gran; > > + > > + memset(&n, 0, sizeof(n)); > > + memcpy(n.n.name, devname, n_len); > > + > > + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; > > + n.n.watch.subtype = subtype; > > + n.n.watch.info = n_len; > > + n.n.error = error; > > + n.n.name_len = name_len; > > + > > + post_device_notification(&n.n.watch, id); > > +} > > + > > +void post_usb_device_notification(const struct usb_device *udev, > > + enum usb_notification_type subtype, u32 error) > > +{ > > + post_usb_notification(dev_name(&udev->dev), subtype, error); > > +} > > + > > +void post_usb_bus_notification(const struct usb_bus *ubus, > > This function's argument is struct usb_bus *, but ... > > > + enum usb_notification_type subtype, u32 error) > > +{ > > + post_usb_notification(ubus->bus_name, subtype, error); > > +} > > +#endif > > + > > static int usbdev_notify(struct notifier_block *self, > > unsigned long action, void *dev) > > { > > switch (action) { > > case USB_DEVICE_ADD: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > > break; > > case USB_DEVICE_REMOVE: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > > + usbdev_remove(dev); > > + break; > > + case USB_BUS_ADD: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > > + break; > > + case USB_BUS_REMOVE: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > > usbdev_remove(dev); > > this function calls usbdev_remove() with incorrect argument if the action > is USB_BUS_REMOVE. So, this seems to cause the following issue [1] on > my environment (R-Car H3 / r8a7795 on next-20190902) [2]. However, I have > no idea how to fix the issue, so I report this issue at the first step. As a few of us just discussed this on IRC, these bus notifiers should probably be dropped as these are the incorrect structure type as you found out. Thanks for the report. greg k-h ^ permalink raw reply [flat|nested] 234+ messages in thread
* RE: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-09-03 9:37 ` Greg Kroah-Hartman (?) @ 2019-09-04 1:53 ` Yoshihiro Shimoda -1 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-04 1:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: David Howells, viro, Casey Schaufler, Stephen Smalley, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Hi Greg, > From: Greg Kroah-Hartman, Sent: Tuesday, September 3, 2019 6:37 PM <snip> > > > +void post_usb_bus_notification(const struct usb_bus *ubus, > > > > This function's argument is struct usb_bus *, but ... > > > > > + enum usb_notification_type subtype, u32 error) > > > +{ > > > + post_usb_notification(ubus->bus_name, subtype, error); > > > +} > > > +#endif > > > + > > > static int usbdev_notify(struct notifier_block *self, > > > unsigned long action, void *dev) > > > { > > > switch (action) { > > > case USB_DEVICE_ADD: > > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > > > break; > > > case USB_DEVICE_REMOVE: > > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > > > + usbdev_remove(dev); > > > + break; > > > + case USB_BUS_ADD: > > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > > > + break; > > > + case USB_BUS_REMOVE: > > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > > > usbdev_remove(dev); > > > > this function calls usbdev_remove() with incorrect argument if the action > > is USB_BUS_REMOVE. So, this seems to cause the following issue [1] on > > my environment (R-Car H3 / r8a7795 on next-20190902) [2]. However, I have > > no idea how to fix the issue, so I report this issue at the first step. > > As a few of us just discussed this on IRC, these bus notifiers should > probably be dropped as these are the incorrect structure type as you > found out. Thanks for the report. Thank you for the discussion. I got it. Best regards, Yoshihiro Shimoda > greg k-h ^ permalink raw reply [flat|nested] 234+ messages in thread
* RE: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-04 1:53 ` Yoshihiro Shimoda 0 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-04 1:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: David Howells, viro, Casey Schaufler, Stephen Smalley, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Hi Greg, > From: Greg Kroah-Hartman, Sent: Tuesday, September 3, 2019 6:37 PM <snip> > > > +void post_usb_bus_notification(const struct usb_bus *ubus, > > > > This function's argument is struct usb_bus *, but ... > > > > > + enum usb_notification_type subtype, u32 error) > > > +{ > > > + post_usb_notification(ubus->bus_name, subtype, error); > > > +} > > > +#endif > > > + > > > static int usbdev_notify(struct notifier_block *self, > > > unsigned long action, void *dev) > > > { > > > switch (action) { > > > case USB_DEVICE_ADD: > > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > > > break; > > > case USB_DEVICE_REMOVE: > > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > > > + usbdev_remove(dev); > > > + break; > > > + case USB_BUS_ADD: > > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > > > + break; > > > + case USB_BUS_REMOVE: > > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > > > usbdev_remove(dev); > > > > this function calls usbdev_remove() with incorrect argument if the action > > is USB_BUS_REMOVE. So, this seems to cause the following issue [1] on > > my environment (R-Car H3 / r8a7795 on next-20190902) [2]. However, I have > > no idea how to fix the issue, so I report this issue at the first step. > > As a few of us just discussed this on IRC, these bus notifiers should > probably be dropped as these are the incorrect structure type as you > found out. Thanks for the report. Thank you for the discussion. I got it. Best regards, Yoshihiro Shimoda > greg k-h ^ permalink raw reply [flat|nested] 234+ messages in thread
* RE: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-04 1:53 ` Yoshihiro Shimoda 0 siblings, 0 replies; 234+ messages in thread From: Yoshihiro Shimoda @ 2019-09-04 1:53 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: David Howells, viro, Casey Schaufler, Stephen Smalley, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Hi Greg, > From: Greg Kroah-Hartman, Sent: Tuesday, September 3, 2019 6:37 PM <snip> > > > +void post_usb_bus_notification(const struct usb_bus *ubus, > > > > This function's argument is struct usb_bus *, but ... > > > > > + enum usb_notification_type subtype, u32 error) > > > +{ > > > + post_usb_notification(ubus->bus_name, subtype, error); > > > +} > > > +#endif > > > + > > > static int usbdev_notify(struct notifier_block *self, > > > unsigned long action, void *dev) > > > { > > > switch (action) { > > > case USB_DEVICE_ADD: > > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > > > break; > > > case USB_DEVICE_REMOVE: > > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > > > + usbdev_remove(dev); > > > + break; > > > + case USB_BUS_ADD: > > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > > > + break; > > > + case USB_BUS_REMOVE: > > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > > > usbdev_remove(dev); > > > > this function calls usbdev_remove() with incorrect argument if the action > > is USB_BUS_REMOVE. So, this seems to cause the following issue [1] on > > my environment (R-Car H3 / r8a7795 on next-20190902) [2]. However, I have > > no idea how to fix the issue, so I report this issue at the first step. > > As a few of us just discussed this on IRC, these bus notifiers should > probably be dropped as these are the incorrect structure type as you > found out. Thanks for the report. Thank you for the discussion. I got it. Best regards, Yoshihiro Shimoda > greg k-h ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-08-30 13:58 ` David Howells @ 2019-09-03 12:51 ` Guenter Roeck -1 siblings, 0 replies; 234+ messages in thread From: Guenter Roeck @ 2019-09-03 12:51 UTC (permalink / raw) To: David Howells Cc: viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Fri, Aug 30, 2019 at 02:58:23PM +0100, David Howells wrote: > Add a USB subsystem notification mechanism whereby notifications about > hardware events such as device connection, disconnection, reset and I/O > errors, can be reported to a monitoring process asynchronously. > > Firstly, an event queue needs to be created: > > fd = open("/dev/event_queue", O_RDWR); > ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); > > then a notification can be set up to report USB notifications via that > queue: > > struct watch_notification_filter filter = { > .nr_filters = 1, > .filters = { > [0] = { > .type = WATCH_TYPE_USB_NOTIFY, > .subtype_filter[0] = UINT_MAX; > }, > }, > }; > ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); > notify_devices(fd, 12); > > After that, records will be placed into the queue when events occur on a > USB device or bus. Records are of the following format: > > struct usb_notification { > struct watch_notification watch; > __u32 error; > __u32 reserved; > __u8 name_len; > __u8 name[0]; > } *n; > > Where: > > n->watch.type will be WATCH_TYPE_USB_NOTIFY > > n->watch.subtype will be the type of notification, such as > NOTIFY_USB_DEVICE_ADD. > > n->watch.info & WATCH_INFO_LENGTH will indicate the length of the > record. > > n->watch.info & WATCH_INFO_ID will be the second argument to > device_notify(), shifted. > > n->error and n->reserved are intended to convey information such as > error codes, but are currently not used > > n->name_len and n->name convey the USB device name as an > unterminated string. This may be truncated - it is currently > limited to a maximum 63 chars. > > Note that it is permissible for event records to be of variable length - > or, at least, the length may be dependent on the subtype. > > Signed-off-by: David Howells <dhowells@redhat.com> > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > cc: linux-usb@vger.kernel.org > --- > > Documentation/watch_queue.rst | 9 ++++++ > drivers/usb/core/Kconfig | 9 ++++++ > drivers/usb/core/devio.c | 56 ++++++++++++++++++++++++++++++++++++++ > drivers/usb/core/hub.c | 4 +++ > include/linux/usb.h | 18 ++++++++++++ > include/uapi/linux/watch_queue.h | 30 ++++++++++++++++++++ > 6 files changed, 125 insertions(+), 1 deletion(-) > > diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst > index 5cc9c6924727..4087a8e670a8 100644 > --- a/Documentation/watch_queue.rst > +++ b/Documentation/watch_queue.rst > @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: > > * Block layer event notifications > > + * USB subsystem event notifications > + > > The notifications buffers can be enabled by: > > @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: > or temporary link loss. Watches of this type are set on the global device > watch list. > > + * WATCH_TYPE_USB_NOTIFY > + > + Notifications of this type indicate USB subsystem events, such as > + attachment, removal, reset and I/O errors. Separate events are generated > + for buses and devices. Watchpoints of this type are set on the global > + device watch list. > + > > Event Filtering > =============== > diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig > index ecaacc8ed311..57e7b649e48b 100644 > --- a/drivers/usb/core/Kconfig > +++ b/drivers/usb/core/Kconfig > @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY > The default value Linux has always had is 2 seconds. Change > this value if you want a different delay and cannot modify > the command line or module parameter. > + > +config USB_NOTIFICATIONS > + bool "Provide USB hardware event notifications" > + depends on USB && DEVICE_NOTIFICATIONS > + help > + This option provides support for getting hardware event notifications > + on USB devices and interfaces. This makes use of the > + /dev/watch_queue misc device to handle the notification buffer. > + device_notify(2) is used to set/remove watches. > diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > index 9063ede411ae..b8572e4d6a1b 100644 > --- a/drivers/usb/core/devio.c > +++ b/drivers/usb/core/devio.c > @@ -41,6 +41,7 @@ > #include <linux/dma-mapping.h> > #include <asm/byteorder.h> > #include <linux/moduleparam.h> > +#include <linux/watch_queue.h> > > #include "usb.h" > > @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) > } > } > > +#ifdef CONFIG_USB_NOTIFICATIONS > +static noinline void post_usb_notification(const char *devname, > + enum usb_notification_type subtype, > + u32 error) > +{ > + unsigned int gran = WATCH_LENGTH_GRANULARITY; > + unsigned int name_len, n_len; > + u64 id = 0; /* Might want to put a dev# here. */ > + > + struct { > + struct usb_notification n; > + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - > + (sizeof(struct usb_notification) - > + offsetof(struct usb_notification, name))]; > + } n; > + > + name_len = strlen(devname); > + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); > + n_len = round_up(offsetof(struct usb_notification, name) + name_len, > + gran) / gran; > + > + memset(&n, 0, sizeof(n)); > + memcpy(n.n.name, devname, n_len); > + > + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; > + n.n.watch.subtype = subtype; > + n.n.watch.info = n_len; > + n.n.error = error; > + n.n.name_len = name_len; > + > + post_device_notification(&n.n.watch, id); > +} > + > +void post_usb_device_notification(const struct usb_device *udev, > + enum usb_notification_type subtype, u32 error) > +{ > + post_usb_notification(dev_name(&udev->dev), subtype, error); > +} > + > +void post_usb_bus_notification(const struct usb_bus *ubus, > + enum usb_notification_type subtype, u32 error) > +{ > + post_usb_notification(ubus->bus_name, subtype, error); > +} > +#endif > + > static int usbdev_notify(struct notifier_block *self, > unsigned long action, void *dev) > { > switch (action) { > case USB_DEVICE_ADD: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > break; > case USB_DEVICE_REMOVE: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > + usbdev_remove(dev); > + break; > + case USB_BUS_ADD: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > + break; > + case USB_BUS_REMOVE: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > usbdev_remove(dev); This added call to usbdev_remove() results in a crash when running the qemu "tosa" emulation. Removing the call fixes the problem. Guenter ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 12:51 ` Guenter Roeck 0 siblings, 0 replies; 234+ messages in thread From: Guenter Roeck @ 2019-09-03 12:51 UTC (permalink / raw) To: David Howells Cc: viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Fri, Aug 30, 2019 at 02:58:23PM +0100, David Howells wrote: > Add a USB subsystem notification mechanism whereby notifications about > hardware events such as device connection, disconnection, reset and I/O > errors, can be reported to a monitoring process asynchronously. > > Firstly, an event queue needs to be created: > > fd = open("/dev/event_queue", O_RDWR); > ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); > > then a notification can be set up to report USB notifications via that > queue: > > struct watch_notification_filter filter = { > .nr_filters = 1, > .filters = { > [0] = { > .type = WATCH_TYPE_USB_NOTIFY, > .subtype_filter[0] = UINT_MAX; > }, > }, > }; > ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); > notify_devices(fd, 12); > > After that, records will be placed into the queue when events occur on a > USB device or bus. Records are of the following format: > > struct usb_notification { > struct watch_notification watch; > __u32 error; > __u32 reserved; > __u8 name_len; > __u8 name[0]; > } *n; > > Where: > > n->watch.type will be WATCH_TYPE_USB_NOTIFY > > n->watch.subtype will be the type of notification, such as > NOTIFY_USB_DEVICE_ADD. > > n->watch.info & WATCH_INFO_LENGTH will indicate the length of the > record. > > n->watch.info & WATCH_INFO_ID will be the second argument to > device_notify(), shifted. > > n->error and n->reserved are intended to convey information such as > error codes, but are currently not used > > n->name_len and n->name convey the USB device name as an > unterminated string. This may be truncated - it is currently > limited to a maximum 63 chars. > > Note that it is permissible for event records to be of variable length - > or, at least, the length may be dependent on the subtype. > > Signed-off-by: David Howells <dhowells@redhat.com> > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > cc: linux-usb@vger.kernel.org > --- > > Documentation/watch_queue.rst | 9 ++++++ > drivers/usb/core/Kconfig | 9 ++++++ > drivers/usb/core/devio.c | 56 ++++++++++++++++++++++++++++++++++++++ > drivers/usb/core/hub.c | 4 +++ > include/linux/usb.h | 18 ++++++++++++ > include/uapi/linux/watch_queue.h | 30 ++++++++++++++++++++ > 6 files changed, 125 insertions(+), 1 deletion(-) > > diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst > index 5cc9c6924727..4087a8e670a8 100644 > --- a/Documentation/watch_queue.rst > +++ b/Documentation/watch_queue.rst > @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: > > * Block layer event notifications > > + * USB subsystem event notifications > + > > The notifications buffers can be enabled by: > > @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: > or temporary link loss. Watches of this type are set on the global device > watch list. > > + * WATCH_TYPE_USB_NOTIFY > + > + Notifications of this type indicate USB subsystem events, such as > + attachment, removal, reset and I/O errors. Separate events are generated > + for buses and devices. Watchpoints of this type are set on the global > + device watch list. > + > > Event Filtering > =======> diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig > index ecaacc8ed311..57e7b649e48b 100644 > --- a/drivers/usb/core/Kconfig > +++ b/drivers/usb/core/Kconfig > @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY > The default value Linux has always had is 2 seconds. Change > this value if you want a different delay and cannot modify > the command line or module parameter. > + > +config USB_NOTIFICATIONS > + bool "Provide USB hardware event notifications" > + depends on USB && DEVICE_NOTIFICATIONS > + help > + This option provides support for getting hardware event notifications > + on USB devices and interfaces. This makes use of the > + /dev/watch_queue misc device to handle the notification buffer. > + device_notify(2) is used to set/remove watches. > diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c > index 9063ede411ae..b8572e4d6a1b 100644 > --- a/drivers/usb/core/devio.c > +++ b/drivers/usb/core/devio.c > @@ -41,6 +41,7 @@ > #include <linux/dma-mapping.h> > #include <asm/byteorder.h> > #include <linux/moduleparam.h> > +#include <linux/watch_queue.h> > > #include "usb.h" > > @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) > } > } > > +#ifdef CONFIG_USB_NOTIFICATIONS > +static noinline void post_usb_notification(const char *devname, > + enum usb_notification_type subtype, > + u32 error) > +{ > + unsigned int gran = WATCH_LENGTH_GRANULARITY; > + unsigned int name_len, n_len; > + u64 id = 0; /* Might want to put a dev# here. */ > + > + struct { > + struct usb_notification n; > + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - > + (sizeof(struct usb_notification) - > + offsetof(struct usb_notification, name))]; > + } n; > + > + name_len = strlen(devname); > + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); > + n_len = round_up(offsetof(struct usb_notification, name) + name_len, > + gran) / gran; > + > + memset(&n, 0, sizeof(n)); > + memcpy(n.n.name, devname, n_len); > + > + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; > + n.n.watch.subtype = subtype; > + n.n.watch.info = n_len; > + n.n.error = error; > + n.n.name_len = name_len; > + > + post_device_notification(&n.n.watch, id); > +} > + > +void post_usb_device_notification(const struct usb_device *udev, > + enum usb_notification_type subtype, u32 error) > +{ > + post_usb_notification(dev_name(&udev->dev), subtype, error); > +} > + > +void post_usb_bus_notification(const struct usb_bus *ubus, > + enum usb_notification_type subtype, u32 error) > +{ > + post_usb_notification(ubus->bus_name, subtype, error); > +} > +#endif > + > static int usbdev_notify(struct notifier_block *self, > unsigned long action, void *dev) > { > switch (action) { > case USB_DEVICE_ADD: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > break; > case USB_DEVICE_REMOVE: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > + usbdev_remove(dev); > + break; > + case USB_BUS_ADD: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > + break; > + case USB_BUS_REMOVE: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > usbdev_remove(dev); This added call to usbdev_remove() results in a crash when running the qemu "tosa" emulation. Removing the call fixes the problem. Guenter ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-08-30 13:58 ` David Howells @ 2019-09-03 16:07 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:07 UTC (permalink / raw) To: Guenter Roeck Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Guenter Roeck <linux@roeck-us.net> wrote: > This added call to usbdev_remove() results in a crash when running > the qemu "tosa" emulation. Removing the call fixes the problem. Yeah - I'm going to drop the bus notification messages for now. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 16:07 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:07 UTC (permalink / raw) To: Guenter Roeck Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Guenter Roeck <linux@roeck-us.net> wrote: > This added call to usbdev_remove() results in a crash when running > the qemu "tosa" emulation. Removing the call fixes the problem. Yeah - I'm going to drop the bus notification messages for now. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-09-03 16:07 ` David Howells @ 2019-09-03 16:12 ` Guenter Roeck -1 siblings, 0 replies; 234+ messages in thread From: Guenter Roeck @ 2019-09-03 16:12 UTC (permalink / raw) To: David Howells Cc: viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, Sep 03, 2019 at 05:07:47PM +0100, David Howells wrote: > Guenter Roeck <linux@roeck-us.net> wrote: > > > This added call to usbdev_remove() results in a crash when running > > the qemu "tosa" emulation. Removing the call fixes the problem. > > Yeah - I'm going to drop the bus notification messages for now. > It is not the bus notification itself causing problems. It is the call to usbdev_remove(). Guenter ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 16:12 ` Guenter Roeck 0 siblings, 0 replies; 234+ messages in thread From: Guenter Roeck @ 2019-09-03 16:12 UTC (permalink / raw) To: David Howells Cc: viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, Sep 03, 2019 at 05:07:47PM +0100, David Howells wrote: > Guenter Roeck <linux@roeck-us.net> wrote: > > > This added call to usbdev_remove() results in a crash when running > > the qemu "tosa" emulation. Removing the call fixes the problem. > > Yeah - I'm going to drop the bus notification messages for now. > It is not the bus notification itself causing problems. It is the call to usbdev_remove(). Guenter ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-09-03 16:07 ` David Howells @ 2019-09-03 16:29 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:29 UTC (permalink / raw) To: Guenter Roeck Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Guenter Roeck <linux@roeck-us.net> wrote: > > > This added call to usbdev_remove() results in a crash when running > > > the qemu "tosa" emulation. Removing the call fixes the problem. > > > > Yeah - I'm going to drop the bus notification messages for now. > > > It is not the bus notification itself causing problems. It is the > call to usbdev_remove(). Unfortunately, I don't know how to fix it and don't have much time to investigate it right now - and it's something that can be added back later. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 16:29 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:29 UTC (permalink / raw) To: Guenter Roeck Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Guenter Roeck <linux@roeck-us.net> wrote: > > > This added call to usbdev_remove() results in a crash when running > > > the qemu "tosa" emulation. Removing the call fixes the problem. > > > > Yeah - I'm going to drop the bus notification messages for now. > > > It is not the bus notification itself causing problems. It is the > call to usbdev_remove(). Unfortunately, I don't know how to fix it and don't have much time to investigate it right now - and it's something that can be added back later. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-09-03 16:29 ` David Howells (?) @ 2019-09-03 17:06 ` Alan Stern -1 siblings, 0 replies; 234+ messages in thread From: Alan Stern @ 2019-09-03 17:06 UTC (permalink / raw) To: David Howells Cc: Guenter Roeck, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, 3 Sep 2019, David Howells wrote: > Guenter Roeck <linux@roeck-us.net> wrote: > > > > > This added call to usbdev_remove() results in a crash when running > > > > the qemu "tosa" emulation. Removing the call fixes the problem. > > > > > > Yeah - I'm going to drop the bus notification messages for now. > > > > > It is not the bus notification itself causing problems. It is the > > call to usbdev_remove(). > > Unfortunately, I don't know how to fix it and don't have much time to > investigate it right now - and it's something that can be added back later. The cause of your problem is quite simple: static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } The original code had usbdev_remove(dev) under the USB_DEVICE_REMOVE case. The patch mistakenly moves it, putting it under the USB_BUS_REMOVE case. If the usbdev_remove() call were left where it was originally, the problem would be solved. Alan Stern ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 17:06 ` Alan Stern 0 siblings, 0 replies; 234+ messages in thread From: Alan Stern @ 2019-09-03 17:06 UTC (permalink / raw) To: David Howells Cc: Guenter Roeck, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, 3 Sep 2019, David Howells wrote: > Guenter Roeck <linux@roeck-us.net> wrote: > > > > > This added call to usbdev_remove() results in a crash when running > > > > the qemu "tosa" emulation. Removing the call fixes the problem. > > > > > > Yeah - I'm going to drop the bus notification messages for now. > > > > > It is not the bus notification itself causing problems. It is the > > call to usbdev_remove(). > > Unfortunately, I don't know how to fix it and don't have much time to > investigate it right now - and it's something that can be added back later. The cause of your problem is quite simple: static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } The original code had usbdev_remove(dev) under the USB_DEVICE_REMOVE case. The patch mistakenly moves it, putting it under the USB_BUS_REMOVE case. If the usbdev_remove() call were left where it was originally, the problem would be solved. Alan Stern ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 17:06 ` Alan Stern 0 siblings, 0 replies; 234+ messages in thread From: Alan Stern @ 2019-09-03 17:06 UTC (permalink / raw) To: David Howells Cc: Guenter Roeck, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, 3 Sep 2019, David Howells wrote: > Guenter Roeck <linux@roeck-us.net> wrote: > > > > > This added call to usbdev_remove() results in a crash when running > > > > the qemu "tosa" emulation. Removing the call fixes the problem. > > > > > > Yeah - I'm going to drop the bus notification messages for now. > > > > > It is not the bus notification itself causing problems. It is the > > call to usbdev_remove(). > > Unfortunately, I don't know how to fix it and don't have much time to > investigate it right now - and it's something that can be added back later. The cause of your problem is quite simple: static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } The original code had usbdev_remove(dev) under the USB_DEVICE_REMOVE case. The patch mistakenly moves it, putting it under the USB_BUS_REMOVE case. If the usbdev_remove() call were left where it was originally, the problem would be solved. Alan Stern ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-09-03 17:06 ` Alan Stern (?) @ 2019-09-03 17:17 ` Alan Stern -1 siblings, 0 replies; 234+ messages in thread From: Alan Stern @ 2019-09-03 17:17 UTC (permalink / raw) To: David Howells Cc: Guenter Roeck, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, 3 Sep 2019, Alan Stern wrote: > On Tue, 3 Sep 2019, David Howells wrote: > > > Guenter Roeck <linux@roeck-us.net> wrote: > > > > > > > This added call to usbdev_remove() results in a crash when running > > > > > the qemu "tosa" emulation. Removing the call fixes the problem. > > > > > > > > Yeah - I'm going to drop the bus notification messages for now. > > > > > > > It is not the bus notification itself causing problems. It is the > > > call to usbdev_remove(). > > > > Unfortunately, I don't know how to fix it and don't have much time to > > investigate it right now - and it's something that can be added back later. > > The cause of your problem is quite simple: > > static int usbdev_notify(struct notifier_block *self, > unsigned long action, void *dev) > { > switch (action) { > case USB_DEVICE_ADD: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > break; > case USB_DEVICE_REMOVE: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > + usbdev_remove(dev); > + break; > + case USB_BUS_ADD: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > + break; > + case USB_BUS_REMOVE: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > usbdev_remove(dev); > break; > } > > The original code had usbdev_remove(dev) under the USB_DEVICE_REMOVE > case. The patch mistakenly moves it, putting it under the ------------------------------^^^^^ Sorry, I should have said "duplicates" it. Alan Stern > USB_BUS_REMOVE case. > > If the usbdev_remove() call were left where it was originally, the > problem would be solved. > > Alan Stern ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 17:17 ` Alan Stern 0 siblings, 0 replies; 234+ messages in thread From: Alan Stern @ 2019-09-03 17:17 UTC (permalink / raw) To: David Howells Cc: Guenter Roeck, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, 3 Sep 2019, Alan Stern wrote: > On Tue, 3 Sep 2019, David Howells wrote: > > > Guenter Roeck <linux@roeck-us.net> wrote: > > > > > > > This added call to usbdev_remove() results in a crash when running > > > > > the qemu "tosa" emulation. Removing the call fixes the problem. > > > > > > > > Yeah - I'm going to drop the bus notification messages for now. > > > > > > > It is not the bus notification itself causing problems. It is the > > > call to usbdev_remove(). > > > > Unfortunately, I don't know how to fix it and don't have much time to > > investigate it right now - and it's something that can be added back later. > > The cause of your problem is quite simple: > > static int usbdev_notify(struct notifier_block *self, > unsigned long action, void *dev) > { > switch (action) { > case USB_DEVICE_ADD: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > break; > case USB_DEVICE_REMOVE: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > + usbdev_remove(dev); > + break; > + case USB_BUS_ADD: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > + break; > + case USB_BUS_REMOVE: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > usbdev_remove(dev); > break; > } > > The original code had usbdev_remove(dev) under the USB_DEVICE_REMOVE > case. The patch mistakenly moves it, putting it under the ------------------------------^^^^^ Sorry, I should have said "duplicates" it. Alan Stern > USB_BUS_REMOVE case. > > If the usbdev_remove() call were left where it was originally, the > problem would be solved. > > Alan Stern ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-03 17:17 ` Alan Stern 0 siblings, 0 replies; 234+ messages in thread From: Alan Stern @ 2019-09-03 17:17 UTC (permalink / raw) To: David Howells Cc: Guenter Roeck, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On Tue, 3 Sep 2019, Alan Stern wrote: > On Tue, 3 Sep 2019, David Howells wrote: > > > Guenter Roeck <linux@roeck-us.net> wrote: > > > > > > > This added call to usbdev_remove() results in a crash when running > > > > > the qemu "tosa" emulation. Removing the call fixes the problem. > > > > > > > > Yeah - I'm going to drop the bus notification messages for now. > > > > > > > It is not the bus notification itself causing problems. It is the > > > call to usbdev_remove(). > > > > Unfortunately, I don't know how to fix it and don't have much time to > > investigate it right now - and it's something that can be added back later. > > The cause of your problem is quite simple: > > static int usbdev_notify(struct notifier_block *self, > unsigned long action, void *dev) > { > switch (action) { > case USB_DEVICE_ADD: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > break; > case USB_DEVICE_REMOVE: > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > + usbdev_remove(dev); > + break; > + case USB_BUS_ADD: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > + break; > + case USB_BUS_REMOVE: > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > usbdev_remove(dev); > break; > } > > The original code had usbdev_remove(dev) under the USB_DEVICE_REMOVE > case. The patch mistakenly moves it, putting it under the ------------------------------^^^^^ Sorry, I should have said "duplicates" it. Alan Stern > USB_BUS_REMOVE case. > > If the usbdev_remove() call were left where it was originally, the > problem would be solved. > > Alan Stern ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] 2019-09-03 17:17 ` Alan Stern @ 2019-09-04 15:17 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 15:17 UTC (permalink / raw) To: Alan Stern Cc: dhowells, Guenter Roeck, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Alan Stern <stern@rowland.harvard.edu> wrote: > > > Unfortunately, I don't know how to fix it and don't have much time to > > > investigate it right now - and it's something that can be added back later. > > > > The cause of your problem is quite simple: > > > > static int usbdev_notify(struct notifier_block *self, > > unsigned long action, void *dev) > > { > > switch (action) { > > case USB_DEVICE_ADD: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > > break; > > case USB_DEVICE_REMOVE: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > > + usbdev_remove(dev); > > + break; > > + case USB_BUS_ADD: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > > + break; > > + case USB_BUS_REMOVE: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > > usbdev_remove(dev); > > break; > > } > > > > The original code had usbdev_remove(dev) under the USB_DEVICE_REMOVE > > case. The patch mistakenly moves it, putting it under the > ------------------------------^^^^^ > > Sorry, I should have said "duplicates" it. Ah, thanks. I'd already removed the USB bus notifications, so I'll leave them out for now. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 08/11] usb: Add USB subsystem notifications [ver #7] @ 2019-09-04 15:17 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 15:17 UTC (permalink / raw) To: Alan Stern Cc: dhowells, Guenter Roeck, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Alan Stern <stern@rowland.harvard.edu> wrote: > > > Unfortunately, I don't know how to fix it and don't have much time to > > > investigate it right now - and it's something that can be added back later. > > > > The cause of your problem is quite simple: > > > > static int usbdev_notify(struct notifier_block *self, > > unsigned long action, void *dev) > > { > > switch (action) { > > case USB_DEVICE_ADD: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); > > break; > > case USB_DEVICE_REMOVE: > > + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); > > + usbdev_remove(dev); > > + break; > > + case USB_BUS_ADD: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); > > + break; > > + case USB_BUS_REMOVE: > > + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); > > usbdev_remove(dev); > > break; > > } > > > > The original code had usbdev_remove(dev) under the USB_DEVICE_REMOVE > > case. The patch mistakenly moves it, putting it under the > ------------------------------^^^^^ > > Sorry, I should have said "duplicates" it. Ah, thanks. I'd already removed the USB bus notifications, so I'll leave them out for now. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:58 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m=20003 NOTIFY[00000004-00000002] ty=0003 sy=0002 i=01000010 KEY 2ffc2e5d change=2[linked] aux=1035096409 ptrs h=6 t=4 m=20003 NOTIFY[00000006-00000004] ty=0003 sy=0003 i=01000010 KEY 2ffc2e5d change=3[unlinked] aux=1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m=6000004 NOTIFY[00000005-00000002] ty=0004 sy=0006 i=04000018 BLOCK 00800050 e=6[critical medium] s=5be8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 233 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 248 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..6cd7101cb28c --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_BUS_ADD] = "bus-add", + [NOTIFY_USB_BUS_REMOVE] = "bus-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) == 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype == WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd == -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) == -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) == -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) == -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) == -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m=20003 NOTIFY[00000004-00000002] ty=0003 sy=0002 i=01000010 KEY 2ffc2e5d change=2[linked] aux=1035096409 ptrs h=6 t=4 m=20003 NOTIFY[00000006-00000004] ty=0003 sy=0003 i=01000010 KEY 2ffc2e5d change=3[unlinked] aux=1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m=6000004 NOTIFY[00000005-00000002] ty=0004 sy=0006 i=04000018 BLOCK 00800050 e=6[critical medium] s=5be8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 233 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 248 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..6cd7101cb28c --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_BUS_ADD] = "bus-add", + [NOTIFY_USB_BUS_REMOVE] = "bus-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) == 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype == WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd == -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) == -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) == -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) == -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) == -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m 003 NOTIFY[00000004-00000002] ty\003 sy\002 i\x01000010 KEY 2ffc2e5d change=2[linked] aux\x1035096409 ptrs h=6 t=4 m 003 NOTIFY[00000006-00000004] ty\003 sy\003 i\x01000010 KEY 2ffc2e5d change=3[unlinked] aux\x1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m`00004 NOTIFY[00000005-00000002] ty\004 sy\006 i\x04000018 BLOCK 00800050 e=6[critical medium] s[e8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 233 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 248 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..6cd7101cb28c --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_BUS_ADD] = "bus-add", + [NOTIFY_USB_BUS_REMOVE] = "bus-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) = -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) = 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype = WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd = -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) = -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) = -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf = MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) = -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) = -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:58 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- security/selinux/hooks.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..a63249ad98ab 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct key *key) +{ + struct key_security_struct *ksec = key->security; + u32 sid = cred_sid(current_cred()); + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- security/selinux/hooks.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..a63249ad98ab 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct key *key) +{ + struct key_security_struct *ksec = key->security; + u32 sid = cred_sid(current_cred()); + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- security/selinux/hooks.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..a63249ad98ab 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct key *key) +{ + struct key_security_struct *ksec = key->security; + u32 sid = cred_sid(current_cred()); + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] 2019-08-30 13:58 ` David Howells @ 2019-08-30 14:15 ` Stephen Smalley -1 siblings, 0 replies; 234+ messages in thread From: Stephen Smalley @ 2019-08-30 14:15 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On 8/30/19 9:58 AM, David Howells wrote: > Implement the watch_key security hook to make sure that a key grants the > caller View permission in order to set a watch on a key. > > For the moment, the watch_devices security hook is left unimplemented as > it's not obvious what the object should be since the queue is global and > didn't previously exist. > > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > security/selinux/hooks.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 74dd46de01b6..a63249ad98ab 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) > *_buffer = context; > return rc; > } > + > +#ifdef CONFIG_KEY_NOTIFICATIONS > +static int selinux_watch_key(struct key *key) > +{ > + struct key_security_struct *ksec = key->security; > + u32 sid = cred_sid(current_cred()); How does this differ from current_sid()? And has current_sid() not been converted to use selinux_cred()? Looks like selinux_kernfs_init_security() also uses current_security() directly. > + > + return avc_has_perm(&selinux_state, > + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); > +} > +#endif > #endif > > #ifdef CONFIG_SECURITY_INFINIBAND > @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(key_free, selinux_key_free), > LSM_HOOK_INIT(key_permission, selinux_key_permission), > LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), > +#ifdef CONFIG_KEY_NOTIFICATIONS > + LSM_HOOK_INIT(watch_key, selinux_watch_key), > +#endif > #endif > > #ifdef CONFIG_AUDIT > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] @ 2019-08-30 14:15 ` Stephen Smalley 0 siblings, 0 replies; 234+ messages in thread From: Stephen Smalley @ 2019-08-30 14:15 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On 8/30/19 9:58 AM, David Howells wrote: > Implement the watch_key security hook to make sure that a key grants the > caller View permission in order to set a watch on a key. > > For the moment, the watch_devices security hook is left unimplemented as > it's not obvious what the object should be since the queue is global and > didn't previously exist. > > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > security/selinux/hooks.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 74dd46de01b6..a63249ad98ab 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) > *_buffer = context; > return rc; > } > + > +#ifdef CONFIG_KEY_NOTIFICATIONS > +static int selinux_watch_key(struct key *key) > +{ > + struct key_security_struct *ksec = key->security; > + u32 sid = cred_sid(current_cred()); How does this differ from current_sid()? And has current_sid() not been converted to use selinux_cred()? Looks like selinux_kernfs_init_security() also uses current_security() directly. > + > + return avc_has_perm(&selinux_state, > + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); > +} > +#endif > #endif > > #ifdef CONFIG_SECURITY_INFINIBAND > @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(key_free, selinux_key_free), > LSM_HOOK_INIT(key_permission, selinux_key_permission), > LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), > +#ifdef CONFIG_KEY_NOTIFICATIONS > + LSM_HOOK_INIT(watch_key, selinux_watch_key), > +#endif > #endif > > #ifdef CONFIG_AUDIT > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] 2019-08-30 13:58 ` David Howells @ 2019-08-30 14:23 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:23 UTC (permalink / raw) To: Stephen Smalley Cc: dhowells, viro, Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Stephen Smalley <sds@tycho.nsa.gov> wrote: > > + u32 sid = cred_sid(current_cred()); > > How does this differ from current_sid()? > > And has current_sid() not been converted to use selinux_cred()? Looks like > selinux_kernfs_init_security() also uses current_security() directly. It probably doesn't - okay I'll use that instead. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] @ 2019-08-30 14:23 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:23 UTC (permalink / raw) To: Stephen Smalley Cc: dhowells, viro, Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Stephen Smalley <sds@tycho.nsa.gov> wrote: > > + u32 sid = cred_sid(current_cred()); > > How does this differ from current_sid()? > > And has current_sid() not been converted to use selinux_cred()? Looks like > selinux_kernfs_init_security() also uses current_security() directly. It probably doesn't - okay I'll use that instead. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] 2019-08-30 13:58 ` David Howells @ 2019-08-30 14:41 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:41 UTC (permalink / raw) To: Stephen Smalley Cc: dhowells, viro, Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel How about the attached instead, then? David --- commit 00444a695b35c602230ac2cabb4f1d7e94e3966d Author: David Howells <dhowells@redhat.com> Date: Thu Aug 29 17:01:34 2019 +0100 selinux: Implement the watch_key security hook Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..88df06969bed 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct key *key) +{ + struct key_security_struct *ksec = key->security; + u32 sid = current_sid(); + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] @ 2019-08-30 14:41 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:41 UTC (permalink / raw) To: Stephen Smalley Cc: dhowells, viro, Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel How about the attached instead, then? David --- commit 00444a695b35c602230ac2cabb4f1d7e94e3966d Author: David Howells <dhowells@redhat.com> Date: Thu Aug 29 17:01:34 2019 +0100 selinux: Implement the watch_key security hook Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..88df06969bed 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct key *key) +{ + struct key_security_struct *ksec = key->security; + u32 sid = current_sid(); + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] 2019-08-30 14:41 ` David Howells @ 2019-08-30 15:41 ` Stephen Smalley -1 siblings, 0 replies; 234+ messages in thread From: Stephen Smalley @ 2019-08-30 15:41 UTC (permalink / raw) To: David Howells Cc: viro, Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On 8/30/19 10:41 AM, David Howells wrote: > How about the attached instead, then? Works for me. > > David > --- > commit 00444a695b35c602230ac2cabb4f1d7e94e3966d > Author: David Howells <dhowells@redhat.com> > Date: Thu Aug 29 17:01:34 2019 +0100 > > selinux: Implement the watch_key security hook > > Implement the watch_key security hook to make sure that a key grants the > caller View permission in order to set a watch on a key. > > For the moment, the watch_devices security hook is left unimplemented as > it's not obvious what the object should be since the queue is global and > didn't previously exist. > > Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 74dd46de01b6..88df06969bed 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) > *_buffer = context; > return rc; > } > + > +#ifdef CONFIG_KEY_NOTIFICATIONS > +static int selinux_watch_key(struct key *key) > +{ > + struct key_security_struct *ksec = key->security; > + u32 sid = current_sid(); > + > + return avc_has_perm(&selinux_state, > + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); > +} > +#endif > #endif > > #ifdef CONFIG_SECURITY_INFINIBAND > @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(key_free, selinux_key_free), > LSM_HOOK_INIT(key_permission, selinux_key_permission), > LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), > +#ifdef CONFIG_KEY_NOTIFICATIONS > + LSM_HOOK_INIT(watch_key, selinux_watch_key), > +#endif > #endif > > #ifdef CONFIG_AUDIT > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #7] @ 2019-08-30 15:41 ` Stephen Smalley 0 siblings, 0 replies; 234+ messages in thread From: Stephen Smalley @ 2019-08-30 15:41 UTC (permalink / raw) To: David Howells Cc: viro, Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On 8/30/19 10:41 AM, David Howells wrote: > How about the attached instead, then? Works for me. > > David > --- > commit 00444a695b35c602230ac2cabb4f1d7e94e3966d > Author: David Howells <dhowells@redhat.com> > Date: Thu Aug 29 17:01:34 2019 +0100 > > selinux: Implement the watch_key security hook > > Implement the watch_key security hook to make sure that a key grants the > caller View permission in order to set a watch on a key. > > For the moment, the watch_devices security hook is left unimplemented as > it's not obvious what the object should be since the queue is global and > didn't previously exist. > > Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 74dd46de01b6..88df06969bed 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -6533,6 +6533,17 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) > *_buffer = context; > return rc; > } > + > +#ifdef CONFIG_KEY_NOTIFICATIONS > +static int selinux_watch_key(struct key *key) > +{ > + struct key_security_struct *ksec = key->security; > + u32 sid = current_sid(); > + > + return avc_has_perm(&selinux_state, > + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); > +} > +#endif > #endif > > #ifdef CONFIG_SECURITY_INFINIBAND > @@ -6965,6 +6976,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(key_free, selinux_key_free), > LSM_HOOK_INIT(key_permission, selinux_key_permission), > LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), > +#ifdef CONFIG_KEY_NOTIFICATIONS > + LSM_HOOK_INIT(watch_key, selinux_watch_key), > +#endif > #endif > > #ifdef CONFIG_AUDIT > ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 13:58 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 82 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 82 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..1c2a908c6446 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4274,7 +4274,7 @@ static int smack_key_permission(key_ref_t key_ref, if (tkp == NULL) return -EACCES; - if (smack_privileged_cred(CAP_MAC_OVERRIDE, cred)) + if (smack_privileged(CAP_MAC_OVERRIDE)) return 0; #ifdef CONFIG_AUDIT @@ -4320,8 +4320,81 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_current(); + int rc; + + if (key == NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security == NULL) + return 0; + /* + * This should not occur + */ + if (tkp == NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred())) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type == WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4783,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 82 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 82 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..1c2a908c6446 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4274,7 +4274,7 @@ static int smack_key_permission(key_ref_t key_ref, if (tkp == NULL) return -EACCES; - if (smack_privileged_cred(CAP_MAC_OVERRIDE, cred)) + if (smack_privileged(CAP_MAC_OVERRIDE)) return 0; #ifdef CONFIG_AUDIT @@ -4320,8 +4320,81 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_current(); + int rc; + + if (key == NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security == NULL) + return 0; + /* + * This should not occur + */ + if (tkp == NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred())) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type == WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4783,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-08-30 13:58 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 13:58 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 82 +++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 82 insertions(+), 1 deletion(-) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..1c2a908c6446 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4274,7 +4274,7 @@ static int smack_key_permission(key_ref_t key_ref, if (tkp = NULL) return -EACCES; - if (smack_privileged_cred(CAP_MAC_OVERRIDE, cred)) + if (smack_privileged(CAP_MAC_OVERRIDE)) return 0; #ifdef CONFIG_AUDIT @@ -4320,8 +4320,81 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_current(); + int rc; + + if (key = NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security = NULL) + return 0; + /* + * This should not occur + */ + if (tkp = NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred())) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type = WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4783,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-08-30 13:58 ` David Howells @ 2019-09-03 15:20 ` Casey Schaufler -1 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-09-03 15:20 UTC (permalink / raw) To: David Howells, viro Cc: Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 8/30/2019 6:58 AM, David Howells wrote: > Implement the watch_key security hook in Smack to make sure that a key > grants the caller Read permission in order to set a watch on a key. > > Also implement the post_notification security hook to make sure that the > notification source is granted Write permission by the watch queue. > > For the moment, the watch_devices security hook is left unimplemented as > it's not obvious what the object should be since the queue is global and > didn't previously exist. > > Signed-off-by: David Howells <dhowells@redhat.com> I tried running your key tests and they fail in "keyctl/move/valid", with 11 FAILED messages, finally hanging after "UNLINK KEY FROM SESSION". It's possible that my Fedora26 system is somehow incompatible with the tests. I don't see anything in your code that would cause this, as the Smack policy on the system shouldn't restrict any access. > --- > > include/linux/lsm_audit.h | 1 + > security/smack/smack_lsm.c | 82 +++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 82 insertions(+), 1 deletion(-) > > diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h > index 915330abf6e5..734d67889826 100644 > --- a/include/linux/lsm_audit.h > +++ b/include/linux/lsm_audit.h > @@ -74,6 +74,7 @@ struct common_audit_data { > #define LSM_AUDIT_DATA_FILE 12 > #define LSM_AUDIT_DATA_IBPKEY 13 > #define LSM_AUDIT_DATA_IBENDPORT 14 > +#define LSM_AUDIT_DATA_NOTIFICATION 15 > union { > struct path path; > struct dentry *dentry; > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index 4c5e5a438f8b..1c2a908c6446 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -4274,7 +4274,7 @@ static int smack_key_permission(key_ref_t key_ref, > if (tkp == NULL) > return -EACCES; > > - if (smack_privileged_cred(CAP_MAC_OVERRIDE, cred)) > + if (smack_privileged(CAP_MAC_OVERRIDE)) > return 0; > > #ifdef CONFIG_AUDIT > @@ -4320,8 +4320,81 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) > return length; > } > > + > +#ifdef CONFIG_KEY_NOTIFICATIONS > +/** > + * smack_watch_key - Smack access to watch a key for notifications. > + * @key: The key to be watched > + * > + * Return 0 if the @watch->cred has permission to read from the key object and > + * an error otherwise. > + */ > +static int smack_watch_key(struct key *key) > +{ > + struct smk_audit_info ad; > + struct smack_known *tkp = smk_of_current(); > + int rc; > + > + if (key == NULL) > + return -EINVAL; > + /* > + * If the key hasn't been initialized give it access so that > + * it may do so. > + */ > + if (key->security == NULL) > + return 0; > + /* > + * This should not occur > + */ > + if (tkp == NULL) > + return -EACCES; > + > + if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred())) > + return 0; > + > +#ifdef CONFIG_AUDIT > + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); > + ad.a.u.key_struct.key = key->serial; > + ad.a.u.key_struct.key_desc = key->description; > +#endif > + rc = smk_access(tkp, key->security, MAY_READ, &ad); > + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); > + return rc; > +} > +#endif /* CONFIG_KEY_NOTIFICATIONS */ > #endif /* CONFIG_KEYS */ > > +#ifdef CONFIG_WATCH_QUEUE > +/** > + * smack_post_notification - Smack access to post a notification to a queue > + * @w_cred: The credentials of the watcher. > + * @cred: The credentials of the event source (may be NULL). > + * @n: The notification message to be posted. > + */ > +static int smack_post_notification(const struct cred *w_cred, > + const struct cred *cred, > + struct watch_notification *n) > +{ > + struct smk_audit_info ad; > + struct smack_known *subj, *obj; > + int rc; > + > + /* Always let maintenance notifications through. */ > + if (n->type == WATCH_TYPE_META) > + return 0; > + > + if (!cred) > + return 0; > + subj = smk_of_task(smack_cred(cred)); > + obj = smk_of_task(smack_cred(w_cred)); > + > + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); > + rc = smk_access(subj, obj, MAY_WRITE, &ad); > + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); > + return rc; > +} > +#endif /* CONFIG_WATCH_QUEUE */ > + > /* > * Smack Audit hooks > * > @@ -4710,8 +4783,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(key_free, smack_key_free), > LSM_HOOK_INIT(key_permission, smack_key_permission), > LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), > +#ifdef CONFIG_KEY_NOTIFICATIONS > + LSM_HOOK_INIT(watch_key, smack_watch_key), > +#endif > #endif /* CONFIG_KEYS */ > > +#ifdef CONFIG_WATCH_QUEUE > + LSM_HOOK_INIT(post_notification, smack_post_notification), > +#endif > + > /* Audit hooks */ > #ifdef CONFIG_AUDIT > LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-09-03 15:20 ` Casey Schaufler 0 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-09-03 15:20 UTC (permalink / raw) To: David Howells, viro Cc: Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 8/30/2019 6:58 AM, David Howells wrote: > Implement the watch_key security hook in Smack to make sure that a key > grants the caller Read permission in order to set a watch on a key. > > Also implement the post_notification security hook to make sure that the > notification source is granted Write permission by the watch queue. > > For the moment, the watch_devices security hook is left unimplemented as > it's not obvious what the object should be since the queue is global and > didn't previously exist. > > Signed-off-by: David Howells <dhowells@redhat.com> I tried running your key tests and they fail in "keyctl/move/valid", with 11 FAILED messages, finally hanging after "UNLINK KEY FROM SESSION". It's possible that my Fedora26 system is somehow incompatible with the tests. I don't see anything in your code that would cause this, as the Smack policy on the system shouldn't restrict any access. > --- > > include/linux/lsm_audit.h | 1 + > security/smack/smack_lsm.c | 82 +++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 82 insertions(+), 1 deletion(-) > > diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h > index 915330abf6e5..734d67889826 100644 > --- a/include/linux/lsm_audit.h > +++ b/include/linux/lsm_audit.h > @@ -74,6 +74,7 @@ struct common_audit_data { > #define LSM_AUDIT_DATA_FILE 12 > #define LSM_AUDIT_DATA_IBPKEY 13 > #define LSM_AUDIT_DATA_IBENDPORT 14 > +#define LSM_AUDIT_DATA_NOTIFICATION 15 > union { > struct path path; > struct dentry *dentry; > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index 4c5e5a438f8b..1c2a908c6446 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -4274,7 +4274,7 @@ static int smack_key_permission(key_ref_t key_ref, > if (tkp == NULL) > return -EACCES; > > - if (smack_privileged_cred(CAP_MAC_OVERRIDE, cred)) > + if (smack_privileged(CAP_MAC_OVERRIDE)) > return 0; > > #ifdef CONFIG_AUDIT > @@ -4320,8 +4320,81 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) > return length; > } > > + > +#ifdef CONFIG_KEY_NOTIFICATIONS > +/** > + * smack_watch_key - Smack access to watch a key for notifications. > + * @key: The key to be watched > + * > + * Return 0 if the @watch->cred has permission to read from the key object and > + * an error otherwise. > + */ > +static int smack_watch_key(struct key *key) > +{ > + struct smk_audit_info ad; > + struct smack_known *tkp = smk_of_current(); > + int rc; > + > + if (key == NULL) > + return -EINVAL; > + /* > + * If the key hasn't been initialized give it access so that > + * it may do so. > + */ > + if (key->security == NULL) > + return 0; > + /* > + * This should not occur > + */ > + if (tkp == NULL) > + return -EACCES; > + > + if (smack_privileged_cred(CAP_MAC_OVERRIDE, current_cred())) > + return 0; > + > +#ifdef CONFIG_AUDIT > + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); > + ad.a.u.key_struct.key = key->serial; > + ad.a.u.key_struct.key_desc = key->description; > +#endif > + rc = smk_access(tkp, key->security, MAY_READ, &ad); > + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); > + return rc; > +} > +#endif /* CONFIG_KEY_NOTIFICATIONS */ > #endif /* CONFIG_KEYS */ > > +#ifdef CONFIG_WATCH_QUEUE > +/** > + * smack_post_notification - Smack access to post a notification to a queue > + * @w_cred: The credentials of the watcher. > + * @cred: The credentials of the event source (may be NULL). > + * @n: The notification message to be posted. > + */ > +static int smack_post_notification(const struct cred *w_cred, > + const struct cred *cred, > + struct watch_notification *n) > +{ > + struct smk_audit_info ad; > + struct smack_known *subj, *obj; > + int rc; > + > + /* Always let maintenance notifications through. */ > + if (n->type == WATCH_TYPE_META) > + return 0; > + > + if (!cred) > + return 0; > + subj = smk_of_task(smack_cred(cred)); > + obj = smk_of_task(smack_cred(w_cred)); > + > + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); > + rc = smk_access(subj, obj, MAY_WRITE, &ad); > + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); > + return rc; > +} > +#endif /* CONFIG_WATCH_QUEUE */ > + > /* > * Smack Audit hooks > * > @@ -4710,8 +4783,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(key_free, smack_key_free), > LSM_HOOK_INIT(key_permission, smack_key_permission), > LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), > +#ifdef CONFIG_KEY_NOTIFICATIONS > + LSM_HOOK_INIT(watch_key, smack_watch_key), > +#endif > #endif /* CONFIG_KEYS */ > > +#ifdef CONFIG_WATCH_QUEUE > + LSM_HOOK_INIT(post_notification, smack_post_notification), > +#endif > + > /* Audit hooks */ > #ifdef CONFIG_AUDIT > LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-08-30 13:58 ` David Howells @ 2019-09-03 15:41 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 15:41 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > I tried running your key tests and they fail in "keyctl/move/valid", > with 11 FAILED messages, finally hanging after "UNLINK KEY FROM SESSION". > It's possible that my Fedora26 system is somehow incompatible with the > tests. I don't see anything in your code that would cause this, as the > Smack policy on the system shouldn't restrict any access. Can you go into keyutils/tests/keyctl/move/valid/ and grab the test.out file? I presume you're running with an upstream-ish kernel and a cutting edge keyutils installed? David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-09-03 15:41 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 15:41 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > I tried running your key tests and they fail in "keyctl/move/valid", > with 11 FAILED messages, finally hanging after "UNLINK KEY FROM SESSION". > It's possible that my Fedora26 system is somehow incompatible with the > tests. I don't see anything in your code that would cause this, as the > Smack policy on the system shouldn't restrict any access. Can you go into keyutils/tests/keyctl/move/valid/ and grab the test.out file? I presume you're running with an upstream-ish kernel and a cutting edge keyutils installed? David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-09-03 15:41 ` David Howells @ 2019-09-03 17:40 ` Casey Schaufler -1 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-09-03 17:40 UTC (permalink / raw) To: David Howells Cc: viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 9/3/2019 8:41 AM, David Howells wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: > >> I tried running your key tests and they fail in "keyctl/move/valid", >> with 11 FAILED messages, finally hanging after "UNLINK KEY FROM SESSION". >> It's possible that my Fedora26 system is somehow incompatible with the >> tests. I don't see anything in your code that would cause this, as the >> Smack policy on the system shouldn't restrict any access. > Can you go into keyutils/tests/keyctl/move/valid/ and grab the test.out file? Inline below > I presume you're running with an upstream-ish kernel Built from your tree. It's possible I've missed an important CONFIG or two. > and a cutting edge > keyutils installed? Also built from your tree. > > David $ cat test.out ++++ BEGINNING TEST +++ ADD KEYRING keyctl newring wibble @s 1065401533 +++ ADD KEY keyctl add user lizard gizzard 1065401533 483362336 +++ LIST KEYRING WITH ONE keyctl rlist 1065401533 483362336 +++ MOVE KEY 1 keyctl move 483362336 1065401533 @s keyctl_move: Operation not supported === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ============== +++ CHECK KEY LINKAGE keyctl rlist @s 1065401533 === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ============== +++ CHECK KEY REMOVED keyctl rlist 1065401533 483362336 === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ============== +++ MOVE KEY 2 keyctl move 483362336 1065401533 @s keyctl_move: Operation not supported === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ============== +++ FORCE MOVE KEY 2 keyctl move -f 483362336 1065401533 @s keyctl_move: Operation not supported === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ============== +++ MOVE KEY 3 keyctl move 483362336 @s 1065401533 keyctl_move: Operation not supported === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ============== +++ MOVE KEY 4 keyctl move -f 483362336 @s 1065401533 keyctl_move: Operation not supported === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ============== +++ ADD KEY 2 keyctl add user lizard gizzard @s 898499184 +++ MOVE KEY 5 keyctl move 483362336 1065401533 @s keyctl_move: Operation not supported === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 | \_ user: lizard 898499184 --alswrv 0 0 \_ user: lizard ============== +++ CHECK KEY UNMOVED keyctl rlist 1065401533 483362336 +++ CHECK KEY UNDISPLACED keyctl rlist @s 1065401533 898499184 +++ FORCE MOVE KEY 6 keyctl move -f 483362336 1065401533 @s keyctl_move: Operation not supported === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 | \_ user: lizard 898499184 --alswrv 0 0 \_ user: lizard ============== +++ CHECK KEY REMOVED keyctl rlist 1065401533 483362336 === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 | \_ user: lizard 898499184 --alswrv 0 0 \_ user: lizard ============== +++ CHECK KEY DISPLACED keyctl rlist @s 1065401533 898499184 === FAILED === Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 | \_ user: lizard 898499184 --alswrv 0 0 \_ user: lizard ============== +++ UNLINK KEY FROM SESSION keyctl unlink 483362336 @s +++ WAITING FOR KEY TO BE UNLINKED keyctl unlink 483362336 @s keyctl_unlink: No such file or directory keyctl unlink 483362336 @s keyctl_unlink: No such file or directory ... ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-09-03 17:40 ` Casey Schaufler 0 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-09-03 17:40 UTC (permalink / raw) To: David Howells Cc: viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 9/3/2019 8:41 AM, David Howells wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: > >> I tried running your key tests and they fail in "keyctl/move/valid", >> with 11 FAILED messages, finally hanging after "UNLINK KEY FROM SESSION". >> It's possible that my Fedora26 system is somehow incompatible with the >> tests. I don't see anything in your code that would cause this, as the >> Smack policy on the system shouldn't restrict any access. > Can you go into keyutils/tests/keyctl/move/valid/ and grab the test.out file? Inline below > I presume you're running with an upstream-ish kernel Built from your tree. It's possible I've missed an important CONFIG or two. > and a cutting edge > keyutils installed? Also built from your tree. > > David $ cat test.out ++++ BEGINNING TEST +++ ADD KEYRING keyctl newring wibble @s 1065401533 +++ ADD KEY keyctl add user lizard gizzard 1065401533 483362336 +++ LIST KEYRING WITH ONE keyctl rlist 1065401533 483362336 +++ MOVE KEY 1 keyctl move 483362336 1065401533 @s keyctl_move: Operation not supported == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ======= +++ CHECK KEY LINKAGE keyctl rlist @s 1065401533 == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ======= +++ CHECK KEY REMOVED keyctl rlist 1065401533 483362336 == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ======= +++ MOVE KEY 2 keyctl move 483362336 1065401533 @s keyctl_move: Operation not supported == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ======= +++ FORCE MOVE KEY 2 keyctl move -f 483362336 1065401533 @s keyctl_move: Operation not supported == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ======= +++ MOVE KEY 3 keyctl move 483362336 @s 1065401533 keyctl_move: Operation not supported == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ======= +++ MOVE KEY 4 keyctl move -f 483362336 @s 1065401533 keyctl_move: Operation not supported == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 \_ user: lizard ======= +++ ADD KEY 2 keyctl add user lizard gizzard @s 898499184 +++ MOVE KEY 5 keyctl move 483362336 1065401533 @s keyctl_move: Operation not supported == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 | \_ user: lizard 898499184 --alswrv 0 0 \_ user: lizard ======= +++ CHECK KEY UNMOVED keyctl rlist 1065401533 483362336 +++ CHECK KEY UNDISPLACED keyctl rlist @s 1065401533 898499184 +++ FORCE MOVE KEY 6 keyctl move -f 483362336 1065401533 @s keyctl_move: Operation not supported == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 | \_ user: lizard 898499184 --alswrv 0 0 \_ user: lizard ======= +++ CHECK KEY REMOVED keyctl rlist 1065401533 483362336 == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 | \_ user: lizard 898499184 --alswrv 0 0 \_ user: lizard ======= +++ CHECK KEY DISPLACED keyctl rlist @s 1065401533 898499184 == FAILED =Session Keyring 680859405 --alswrv 0 0 keyring: RHTS/keyctl/32472 1065401533 --alswrv 0 0 \_ keyring: wibble 483362336 --alswrv 0 0 | \_ user: lizard 898499184 --alswrv 0 0 \_ user: lizard ======= +++ UNLINK KEY FROM SESSION keyctl unlink 483362336 @s +++ WAITING FOR KEY TO BE UNLINKED keyctl unlink 483362336 @s keyctl_unlink: No such file or directory keyctl unlink 483362336 @s keyctl_unlink: No such file or directory ... ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-09-03 15:41 ` David Howells @ 2019-09-03 18:06 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 18:06 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > Built from your tree. What branch? keys-next? > keyctl move 483362336 1065401533 @s > keyctl_move: Operation not supported Odd. That should be unconditional if you have CONFIG_KEYS and v5.3-rc1. Can you try: keyctl supports or just: keyctl add user a a @s which will give you an id, say 1234, then: keyctl move 1234 @s @u see if that works. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-09-03 18:06 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 18:06 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > Built from your tree. What branch? keys-next? > keyctl move 483362336 1065401533 @s > keyctl_move: Operation not supported Odd. That should be unconditional if you have CONFIG_KEYS and v5.3-rc1. Can you try: keyctl supports or just: keyctl add user a a @s which will give you an id, say 1234, then: keyctl move 1234 @s @u see if that works. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-09-03 18:06 ` David Howells @ 2019-09-03 22:16 ` Casey Schaufler -1 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-09-03 22:16 UTC (permalink / raw) To: David Howells Cc: viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 9/3/2019 11:06 AM, David Howells wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: > >> Built from your tree. > What branch? keys-next? I rebuilt with keys-next, updated the tests again, and now the suite looks to be running trouble free. I do see a message SKIP DUE TO DISABLED SELINUX which I take to mean that there is an SELinux specific test. > >> keyctl move 483362336 1065401533 @s >> keyctl_move: Operation not supported > Odd. That should be unconditional if you have CONFIG_KEYS and v5.3-rc1. Can > you try: > > keyctl supports > > or just: > > keyctl add user a a @s > > which will give you an id, say 1234, then: > > keyctl move 1234 @s @u > > see if that works. > > David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-09-03 22:16 ` Casey Schaufler 0 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-09-03 22:16 UTC (permalink / raw) To: David Howells Cc: viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 9/3/2019 11:06 AM, David Howells wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: > >> Built from your tree. > What branch? keys-next? I rebuilt with keys-next, updated the tests again, and now the suite looks to be running trouble free. I do see a message SKIP DUE TO DISABLED SELINUX which I take to mean that there is an SELinux specific test. > >> keyctl move 483362336 1065401533 @s >> keyctl_move: Operation not supported > Odd. That should be unconditional if you have CONFIG_KEYS and v5.3-rc1. Can > you try: > > keyctl supports > > or just: > > keyctl add user a a @s > > which will give you an id, say 1234, then: > > keyctl move 1234 @s @u > > see if that works. > > David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-09-03 18:06 ` David Howells @ 2019-09-03 22:39 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 22:39 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > I rebuilt with keys-next, updated the tests again, and now > the suite looks to be running trouble free. Glad to hear that, thanks. > I do see a message SKIP DUE TO DISABLED SELINUX which I take to mean that > there is an SELinux specific test. tests/bugzillas/bz1031154/runtest.sh David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-09-03 22:39 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 22:39 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > I rebuilt with keys-next, updated the tests again, and now > the suite looks to be running trouble free. Glad to hear that, thanks. > I do see a message SKIP DUE TO DISABLED SELINUX which I take to mean that > there is an SELinux specific test. tests/bugzillas/bz1031154/runtest.sh David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-09-03 18:06 ` David Howells @ 2019-09-04 12:08 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 12:08 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > I rebuilt with keys-next, updated the tests again, and now > the suite looks to be running trouble free. Can I put you down as an Acked-by or something on this patch? Thanks, David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-09-04 12:08 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-04 12:08 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > I rebuilt with keys-next, updated the tests again, and now > the suite looks to be running trouble free. Can I put you down as an Acked-by or something on this patch? Thanks, David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] 2019-09-04 12:08 ` David Howells @ 2019-09-04 14:56 ` Casey Schaufler -1 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-09-04 14:56 UTC (permalink / raw) To: David Howells Cc: viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 9/4/2019 5:08 AM, David Howells wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: > >> I rebuilt with keys-next, updated the tests again, and now >> the suite looks to be running trouble free. > Can I put you down as an Acked-by or something on this patch? I haven't done anything to see if the patch is actually useful. I don't have much (read: anything) in the way of key tests for Smack, so I can't say if this is what I want long term. But as it does appear harmless, yes, you can add my Acked-by on this. > > Thanks, > David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #7] @ 2019-09-04 14:56 ` Casey Schaufler 0 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-09-04 14:56 UTC (permalink / raw) To: David Howells Cc: viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 9/4/2019 5:08 AM, David Howells wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: > >> I rebuilt with keys-next, updated the tests again, and now >> the suite looks to be running trouble free. > Can I put you down as an Acked-by or something on this patch? I haven't done anything to see if the patch is actually useful. I don't have much (read: anything) in the way of key tests for Smack, so I can't say if this is what I want long term. But as it does appear harmless, yes, you can add my Acked-by on this. > > Thanks, > David ^ permalink raw reply [flat|nested] 234+ messages in thread
* watch_queue(7) manpage 2019-08-30 13:57 ` David Howells (?) @ 2019-08-30 14:15 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:15 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel .\" .\" Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. .\" Written by David Howells (dhowells@redhat.com) .\" .\" This program is free software; you can redistribute it and/or .\" modify it under the terms of the GNU General Public Licence .\" as published by the Free Software Foundation; either version .\" 2 of the Licence, or (at your option) any later version. .\" .TH WATCH_QUEUE 7 "28 Aug 2019" Linux "General Kernel Notifications" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH NAME /dev/watch_queue \- General kernel notification queue .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SYNOPSIS #include <linux/watch_queue.h> .EX int fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, size / page_size); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); .EE .SH OVERVIEW .PP The general kernel notification queue is a general purpose transport for kernel notification messages to userspace. Notification messages are marked with type information so that events from multiple sources can be distinguished. Messages are also of variable length to accommodate different information for each type. .PP This queue is implemented as a misc device that can be opened multiple times, each opening creating a fully independent queue. Queues are then configured with the size and filtering, event sources are attached and the queue is mapped into a process's VM. .PP Queues take the form of a ring buffer with shared index pointers, all of which is accessed directly within the mapping. There are no read and write methods, though poll is provided so that the buffer can be waited upon. .PP A queue pins a certain amount of locked kernel memory (so that the kernel can write a notification into it from contexts where swapping cannot be performed), and so is subject to resource limit restrictions on .BR RLIMIT_MEMLOCK . .PP Sources must be attached to a queue manually; there's no single global event source, but rather a variety of sources, each of which can be attached to by multiple queues. Attachments can be set up by: .TP .BR keyctl_watch_key (3) Monitor a key or keyring for changes. .TP .BR device_notify (2) Monitor a global source of device events from USB and block devices, such as device detection, device removal and I/O errors. .PP Because a source can produce a lot of different events, not all of which may be of interest to the watcher, a filter can be set on a queue to determine whether a particular event will get inserted in a queue at the point of posting inside the kernel. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RING STRUCTURE .PP The ring buffer is divided into 8-byte slots and notification message occupies between 1 and 63 of those slots. Each message begins with a header of the form: .PP .in +4n .EX struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; .EE .in .PP Where .I type indicates the general class of notification, .I subtype indicates the specific type of notification within that class and .I info includes the length (in slots), the watcher's ID and some type-specific information. .PP Messages inserted into the buffer aren't allowed to split over the end of the buffer; instead a .I skip notification will be inserted to pad to the end of the buffer. A skip notification will have the type set to .B WATCH_TYPE_META and the subtype set to .BR WATCH_META_SKIP_NOTIFICATION , with the length indicating how much should be skipped. .PP To avoid the need for an extra page dedicated solely to metadata pointers, the first few slots are covered by a permanent skip notification and contain ring metadata including the pointers. The buffer has a 'header' of the form: .PP .in +4n .EX struct { struct watch_notification watch; __u32 head; __u32 tail; __u32 mask; __u32 __reserved; }; .EE .in .PP This includes the ring indices, .IR head " and " tail , and a .I mask to mask them off with before use. When using the ring indices, the following precautions should be observed: .TP .B (1) .I head indicates where the kernel will insert the next message into the buffer. Only the kernel is allowed to change head. .TP .B (2) .I tail indicates where the next message for userspace to consume can be found; tail will never be changed by the kernel. .TP .B (3) An .IR acquire -class memory barrier must be used to read head. It is not necessary to use a memory barrier to read tail. .TP .B (4) The buffer is empty if tail == head. .TP .B (5) head and tail should not be masked off after increment, but rather left to wrap naturally; this means that the index must be masked off before being used to access the buffer. .TP .B (6) After consuming a message, the length (in slots) of the message should be added to tail and tail must not be then masked off. .TP .B (7) A .IR release -class memory barrier must be used to update .IR tail . .PP If the head and tail values become too far separated or head points to a forbidden area of the buffer, no further message insertion will take place and .IR poll () will flag .BR POLLERR . Otherwise, poll() will flag .BR POLLIN " and " POLLRDNORM if tail != head. .PP The ring as a whole is described by the following structure: .PP .in +4n .EX struct watch_queue_buffer { union { struct { struct watch_notification watch; __u32 head; __u32 tail; __u32 mask; __u32 __reserved; } meta; struct watch_notification slots[0]; }; }; .EE .in .PP Where .I meta covers the slots holding the ring indices and other metadata. Note that the metadata may be extended in future. It's size can be determined by checking the length of the skip pseudo-message that covers it (see .IR meta.watch ). .PP In the event that the ring is full when the kernel needs to write in a notification, it will set .B WATCH_INFO_NOTIFICATIONS_LOST in .IR meta.watch.info to indicate an overrun. If the flag is noticed as being unset, the entire word can be simply cleared without bothering the kernel as the kernel doesn't ever read it. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH IOCTL COMMANDS The device has the following .IR ioctl () commands: .TP .B IOC_WATCH_QUEUE_SET_SIZE The ioctl argument is indicates the size of the buffer in pages and must be a power of two. This command allocates the memory to back the buffer. .IP This may only be done once and the buffer cannot be mmap'd until this command has been done. .TP .B IOC_WATCH_QUEUE_SET_FILTER This is used to set filters on the notifications that get written into the buffer. The ioctl argument points to a structure of the following form: .IP .in +4n .EX struct watch_notification_filter { __u32 nr_filters; __u32 __reserved; struct watch_notification_type_filter filters[]; }; .EE .in .IP Where .I nr_filters indicates the number of elements in the .IR filters [] array. Each element in the filters array specifies a filter and is of the following form: .IP .in +4n .EX struct watch_notification_type_filter { __u32 type; __u32 info_filter; __u32 info_mask; __u32 subtype_filter[8]; }; .EE .in .IP Where .I type refer to the type field in a notification record header, info_filter and info_mask refer to the info field and subtype_filter is a bit-mask of subtypes. .IP If no filters are installed, all notifications are allowed by default and if one or more filters are installed, notifications are disallowed by default. .IP A notifications matches a filter if, for notification N and filter F: .IP .in +4n .EX N->type == F->type && (F->subtype_filter[N->subtype >> 5] & (1U << (N->subtype & 31))) && (N->info & F->info_mask) == F->info_filter) .EE .in .IP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH EXAMPLE To use the notification mechanism, first of all the device has to be opened, the size must be set and the buffer mapped: .PP .in +4n .EX int wfd = open("/dev/watch_queue", O_RDWR); ioctl(wfd, IOC_WATCH_QUEUE_SET_SIZE, 1); struct watch_queue_buffer *buf = mmap(NULL, 1 * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, wfd, 0); .EE .in .PP From this point, the buffer is open for business. Filters can be set to restrict the notifications that get inserted into the buffer from the sources that are watched. For example: .PP .in +4n .EX static struct watch_notification_filter filter = { .nr_filters = 2, .__reserved = 0, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = 1 << NOTIFY_KEY_LINKED, .info_filter = 1 << WATCH_INFO_FLAG_2, .info_mask = 1 << WATCH_INFO_FLAG_2, }, [1] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = 1 << NOTIFY_USB_DEVICE_ADD, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); .EE .in .PP will only allow key-change notifications that indicate a key is linked into a keyring and then only if type-specific flag WATCH_INFO_FLAG_2 is set on the notification and will only allow USB device-add notifications, blocking other USB notifications and all block device notifications. .PP Sources can then be watched, for example: .PP .in +4n .EX keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, wfd, 0x33); watch_devices(wfd, 0x55, 0); .EE .in .PP The first places a watch on the process's session keyring, directing the notifications to the buffer we just created and specifying that they should be tagged with 0x33 in the info ID field. The second places a watch on the global device notifications queue, specifying that notifications from that should be tagged with info ID 0x55. .PP The device file descriptor can then be polled to find out when the kernel writes something into the buffer or if the ring indices become incoherent: .PP .in +4n .EX struct pollfd p[1]; p[0].fd = wfd; p[0].events = POLLIN | POLLERR; p[0].revents = 0; poll(p, 1, -1); .EE .in .PP When it is determined that there is something in the buffer, messages can be read out of the ring with something like the following: .PP .in +4n .EX struct watch_notification *n; unsigned int len, head, tail, mask = buf->meta.mask; while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), tail = buf->meta.tail, tail != head ) { n = &buf->slots[tail & mask]; len = n->info & WATCH_INFO_LENGTH; len >>= WATCH_INFO_LENGTH__SHIFT; if (len == 0) abort(); switch (n->type) { case WATCH_TYPE_META: switch (n->subtype) { case WATCH_META_REMOVAL_NOTIFICATION: saw_removal_notification(n); break; } break; case WATCH_TYPE_KEY_NOTIFY: saw_key_change(n); break; case WATCH_TYPE_USB_NOTIFY: saw_usb_event(n); break; } tail += len; __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); } .EE .in .PP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH VERSIONS The notification queue driver first appeared in v??? of the Linux kernel. .SH SEE ALSO .ad l .nh .BR ioctl (2), .BR keyctl (1), .BR keyctl_watch_key (3), .BR poll (2), .BR setrlimit (2) ^ permalink raw reply [flat|nested] 234+ messages in thread
* watch_queue(7) manpage @ 2019-08-30 14:15 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:15 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel .\" .\" Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. .\" Written by David Howells (dhowells@redhat.com) .\" .\" This program is free software; you can redistribute it and/or .\" modify it under the terms of the GNU General Public Licence .\" as published by the Free Software Foundation; either version .\" 2 of the Licence, or (at your option) any later version. .\" .TH WATCH_QUEUE 7 "28 Aug 2019" Linux "General Kernel Notifications" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH NAME /dev/watch_queue \- General kernel notification queue .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SYNOPSIS #include <linux/watch_queue.h> .EX int fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, size / page_size); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); .EE .SH OVERVIEW .PP The general kernel notification queue is a general purpose transport for kernel notification messages to userspace. Notification messages are marked with type information so that events from multiple sources can be distinguished. Messages are also of variable length to accommodate different information for each type. .PP This queue is implemented as a misc device that can be opened multiple times, each opening creating a fully independent queue. Queues are then configured with the size and filtering, event sources are attached and the queue is mapped into a process's VM. .PP Queues take the form of a ring buffer with shared index pointers, all of which is accessed directly within the mapping. There are no read and write methods, though poll is provided so that the buffer can be waited upon. .PP A queue pins a certain amount of locked kernel memory (so that the kernel can write a notification into it from contexts where swapping cannot be performed), and so is subject to resource limit restrictions on .BR RLIMIT_MEMLOCK . .PP Sources must be attached to a queue manually; there's no single global event source, but rather a variety of sources, each of which can be attached to by multiple queues. Attachments can be set up by: .TP .BR keyctl_watch_key (3) Monitor a key or keyring for changes. .TP .BR device_notify (2) Monitor a global source of device events from USB and block devices, such as device detection, device removal and I/O errors. .PP Because a source can produce a lot of different events, not all of which may be of interest to the watcher, a filter can be set on a queue to determine whether a particular event will get inserted in a queue at the point of posting inside the kernel. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RING STRUCTURE .PP The ring buffer is divided into 8-byte slots and notification message occupies between 1 and 63 of those slots. Each message begins with a header of the form: .PP .in +4n .EX struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; .EE .in .PP Where .I type indicates the general class of notification, .I subtype indicates the specific type of notification within that class and .I info includes the length (in slots), the watcher's ID and some type-specific information. .PP Messages inserted into the buffer aren't allowed to split over the end of the buffer; instead a .I skip notification will be inserted to pad to the end of the buffer. A skip notification will have the type set to .B WATCH_TYPE_META and the subtype set to .BR WATCH_META_SKIP_NOTIFICATION , with the length indicating how much should be skipped. .PP To avoid the need for an extra page dedicated solely to metadata pointers, the first few slots are covered by a permanent skip notification and contain ring metadata including the pointers. The buffer has a 'header' of the form: .PP .in +4n .EX struct { struct watch_notification watch; __u32 head; __u32 tail; __u32 mask; __u32 __reserved; }; .EE .in .PP This includes the ring indices, .IR head " and " tail , and a .I mask to mask them off with before use. When using the ring indices, the following precautions should be observed: .TP .B (1) .I head indicates where the kernel will insert the next message into the buffer. Only the kernel is allowed to change head. .TP .B (2) .I tail indicates where the next message for userspace to consume can be found; tail will never be changed by the kernel. .TP .B (3) An .IR acquire -class memory barrier must be used to read head. It is not necessary to use a memory barrier to read tail. .TP .B (4) The buffer is empty if tail == head. .TP .B (5) head and tail should not be masked off after increment, but rather left to wrap naturally; this means that the index must be masked off before being used to access the buffer. .TP .B (6) After consuming a message, the length (in slots) of the message should be added to tail and tail must not be then masked off. .TP .B (7) A .IR release -class memory barrier must be used to update .IR tail . .PP If the head and tail values become too far separated or head points to a forbidden area of the buffer, no further message insertion will take place and .IR poll () will flag .BR POLLERR . Otherwise, poll() will flag .BR POLLIN " and " POLLRDNORM if tail != head. .PP The ring as a whole is described by the following structure: .PP .in +4n .EX struct watch_queue_buffer { union { struct { struct watch_notification watch; __u32 head; __u32 tail; __u32 mask; __u32 __reserved; } meta; struct watch_notification slots[0]; }; }; .EE .in .PP Where .I meta covers the slots holding the ring indices and other metadata. Note that the metadata may be extended in future. It's size can be determined by checking the length of the skip pseudo-message that covers it (see .IR meta.watch ). .PP In the event that the ring is full when the kernel needs to write in a notification, it will set .B WATCH_INFO_NOTIFICATIONS_LOST in .IR meta.watch.info to indicate an overrun. If the flag is noticed as being unset, the entire word can be simply cleared without bothering the kernel as the kernel doesn't ever read it. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH IOCTL COMMANDS The device has the following .IR ioctl () commands: .TP .B IOC_WATCH_QUEUE_SET_SIZE The ioctl argument is indicates the size of the buffer in pages and must be a power of two. This command allocates the memory to back the buffer. .IP This may only be done once and the buffer cannot be mmap'd until this command has been done. .TP .B IOC_WATCH_QUEUE_SET_FILTER This is used to set filters on the notifications that get written into the buffer. The ioctl argument points to a structure of the following form: .IP .in +4n .EX struct watch_notification_filter { __u32 nr_filters; __u32 __reserved; struct watch_notification_type_filter filters[]; }; .EE .in .IP Where .I nr_filters indicates the number of elements in the .IR filters [] array. Each element in the filters array specifies a filter and is of the following form: .IP .in +4n .EX struct watch_notification_type_filter { __u32 type; __u32 info_filter; __u32 info_mask; __u32 subtype_filter[8]; }; .EE .in .IP Where .I type refer to the type field in a notification record header, info_filter and info_mask refer to the info field and subtype_filter is a bit-mask of subtypes. .IP If no filters are installed, all notifications are allowed by default and if one or more filters are installed, notifications are disallowed by default. .IP A notifications matches a filter if, for notification N and filter F: .IP .in +4n .EX N->type == F->type && (F->subtype_filter[N->subtype >> 5] & (1U << (N->subtype & 31))) && (N->info & F->info_mask) == F->info_filter) .EE .in .IP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH EXAMPLE To use the notification mechanism, first of all the device has to be opened, the size must be set and the buffer mapped: .PP .in +4n .EX int wfd = open("/dev/watch_queue", O_RDWR); ioctl(wfd, IOC_WATCH_QUEUE_SET_SIZE, 1); struct watch_queue_buffer *buf = mmap(NULL, 1 * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, wfd, 0); .EE .in .PP >From this point, the buffer is open for business. Filters can be set to restrict the notifications that get inserted into the buffer from the sources that are watched. For example: .PP .in +4n .EX static struct watch_notification_filter filter = { .nr_filters = 2, .__reserved = 0, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = 1 << NOTIFY_KEY_LINKED, .info_filter = 1 << WATCH_INFO_FLAG_2, .info_mask = 1 << WATCH_INFO_FLAG_2, }, [1] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = 1 << NOTIFY_USB_DEVICE_ADD, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); .EE .in .PP will only allow key-change notifications that indicate a key is linked into a keyring and then only if type-specific flag WATCH_INFO_FLAG_2 is set on the notification and will only allow USB device-add notifications, blocking other USB notifications and all block device notifications. .PP Sources can then be watched, for example: .PP .in +4n .EX keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, wfd, 0x33); watch_devices(wfd, 0x55, 0); .EE .in .PP The first places a watch on the process's session keyring, directing the notifications to the buffer we just created and specifying that they should be tagged with 0x33 in the info ID field. The second places a watch on the global device notifications queue, specifying that notifications from that should be tagged with info ID 0x55. .PP The device file descriptor can then be polled to find out when the kernel writes something into the buffer or if the ring indices become incoherent: .PP .in +4n .EX struct pollfd p[1]; p[0].fd = wfd; p[0].events = POLLIN | POLLERR; p[0].revents = 0; poll(p, 1, -1); .EE .in .PP When it is determined that there is something in the buffer, messages can be read out of the ring with something like the following: .PP .in +4n .EX struct watch_notification *n; unsigned int len, head, tail, mask = buf->meta.mask; while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), tail = buf->meta.tail, tail != head ) { n = &buf->slots[tail & mask]; len = n->info & WATCH_INFO_LENGTH; len >>= WATCH_INFO_LENGTH__SHIFT; if (len == 0) abort(); switch (n->type) { case WATCH_TYPE_META: switch (n->subtype) { case WATCH_META_REMOVAL_NOTIFICATION: saw_removal_notification(n); break; } break; case WATCH_TYPE_KEY_NOTIFY: saw_key_change(n); break; case WATCH_TYPE_USB_NOTIFY: saw_usb_event(n); break; } tail += len; __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); } .EE .in .PP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH VERSIONS The notification queue driver first appeared in v??? of the Linux kernel. .SH SEE ALSO .ad l .nh .BR ioctl (2), .BR keyctl (1), .BR keyctl_watch_key (3), .BR poll (2), .BR setrlimit (2) ^ permalink raw reply [flat|nested] 234+ messages in thread
* watch_queue(7) manpage @ 2019-08-30 14:15 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:15 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel .\" .\" Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. .\" Written by David Howells (dhowells@redhat.com) .\" .\" This program is free software; you can redistribute it and/or .\" modify it under the terms of the GNU General Public Licence .\" as published by the Free Software Foundation; either version .\" 2 of the Licence, or (at your option) any later version. .\" .TH WATCH_QUEUE 7 "28 Aug 2019" Linux "General Kernel Notifications" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH NAME /dev/watch_queue \- General kernel notification queue .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SYNOPSIS #include <linux/watch_queue.h> .EX int fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, size / page_size); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); .EE .SH OVERVIEW .PP The general kernel notification queue is a general purpose transport for kernel notification messages to userspace. Notification messages are marked with type information so that events from multiple sources can be distinguished. Messages are also of variable length to accommodate different information for each type. .PP This queue is implemented as a misc device that can be opened multiple times, each opening creating a fully independent queue. Queues are then configured with the size and filtering, event sources are attached and the queue is mapped into a process's VM. .PP Queues take the form of a ring buffer with shared index pointers, all of which is accessed directly within the mapping. There are no read and write methods, though poll is provided so that the buffer can be waited upon. .PP A queue pins a certain amount of locked kernel memory (so that the kernel can write a notification into it from contexts where swapping cannot be performed), and so is subject to resource limit restrictions on .BR RLIMIT_MEMLOCK . .PP Sources must be attached to a queue manually; there's no single global event source, but rather a variety of sources, each of which can be attached to by multiple queues. Attachments can be set up by: .TP .BR keyctl_watch_key (3) Monitor a key or keyring for changes. .TP .BR device_notify (2) Monitor a global source of device events from USB and block devices, such as device detection, device removal and I/O errors. .PP Because a source can produce a lot of different events, not all of which may be of interest to the watcher, a filter can be set on a queue to determine whether a particular event will get inserted in a queue at the point of posting inside the kernel. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RING STRUCTURE .PP The ring buffer is divided into 8-byte slots and notification message occupies between 1 and 63 of those slots. Each message begins with a header of the form: .PP .in +4n .EX struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; .EE .in .PP Where .I type indicates the general class of notification, .I subtype indicates the specific type of notification within that class and .I info includes the length (in slots), the watcher's ID and some type-specific information. .PP Messages inserted into the buffer aren't allowed to split over the end of the buffer; instead a .I skip notification will be inserted to pad to the end of the buffer. A skip notification will have the type set to .B WATCH_TYPE_META and the subtype set to .BR WATCH_META_SKIP_NOTIFICATION , with the length indicating how much should be skipped. .PP To avoid the need for an extra page dedicated solely to metadata pointers, the first few slots are covered by a permanent skip notification and contain ring metadata including the pointers. The buffer has a 'header' of the form: .PP .in +4n .EX struct { struct watch_notification watch; __u32 head; __u32 tail; __u32 mask; __u32 __reserved; }; .EE .in .PP This includes the ring indices, .IR head " and " tail , and a .I mask to mask them off with before use. When using the ring indices, the following precautions should be observed: .TP .B (1) .I head indicates where the kernel will insert the next message into the buffer. Only the kernel is allowed to change head. .TP .B (2) .I tail indicates where the next message for userspace to consume can be found; tail will never be changed by the kernel. .TP .B (3) An .IR acquire -class memory barrier must be used to read head. It is not necessary to use a memory barrier to read tail. .TP .B (4) The buffer is empty if tail = head. .TP .B (5) head and tail should not be masked off after increment, but rather left to wrap naturally; this means that the index must be masked off before being used to access the buffer. .TP .B (6) After consuming a message, the length (in slots) of the message should be added to tail and tail must not be then masked off. .TP .B (7) A .IR release -class memory barrier must be used to update .IR tail . .PP If the head and tail values become too far separated or head points to a forbidden area of the buffer, no further message insertion will take place and .IR poll () will flag .BR POLLERR . Otherwise, poll() will flag .BR POLLIN " and " POLLRDNORM if tail != head. .PP The ring as a whole is described by the following structure: .PP .in +4n .EX struct watch_queue_buffer { union { struct { struct watch_notification watch; __u32 head; __u32 tail; __u32 mask; __u32 __reserved; } meta; struct watch_notification slots[0]; }; }; .EE .in .PP Where .I meta covers the slots holding the ring indices and other metadata. Note that the metadata may be extended in future. It's size can be determined by checking the length of the skip pseudo-message that covers it (see .IR meta.watch ). .PP In the event that the ring is full when the kernel needs to write in a notification, it will set .B WATCH_INFO_NOTIFICATIONS_LOST in .IR meta.watch.info to indicate an overrun. If the flag is noticed as being unset, the entire word can be simply cleared without bothering the kernel as the kernel doesn't ever read it. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH IOCTL COMMANDS The device has the following .IR ioctl () commands: .TP .B IOC_WATCH_QUEUE_SET_SIZE The ioctl argument is indicates the size of the buffer in pages and must be a power of two. This command allocates the memory to back the buffer. .IP This may only be done once and the buffer cannot be mmap'd until this command has been done. .TP .B IOC_WATCH_QUEUE_SET_FILTER This is used to set filters on the notifications that get written into the buffer. The ioctl argument points to a structure of the following form: .IP .in +4n .EX struct watch_notification_filter { __u32 nr_filters; __u32 __reserved; struct watch_notification_type_filter filters[]; }; .EE .in .IP Where .I nr_filters indicates the number of elements in the .IR filters [] array. Each element in the filters array specifies a filter and is of the following form: .IP .in +4n .EX struct watch_notification_type_filter { __u32 type; __u32 info_filter; __u32 info_mask; __u32 subtype_filter[8]; }; .EE .in .IP Where .I type refer to the type field in a notification record header, info_filter and info_mask refer to the info field and subtype_filter is a bit-mask of subtypes. .IP If no filters are installed, all notifications are allowed by default and if one or more filters are installed, notifications are disallowed by default. .IP A notifications matches a filter if, for notification N and filter F: .IP .in +4n .EX N->type = F->type && (F->subtype_filter[N->subtype >> 5] & (1U << (N->subtype & 31))) && (N->info & F->info_mask) = F->info_filter) .EE .in .IP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH EXAMPLE To use the notification mechanism, first of all the device has to be opened, the size must be set and the buffer mapped: .PP .in +4n .EX int wfd = open("/dev/watch_queue", O_RDWR); ioctl(wfd, IOC_WATCH_QUEUE_SET_SIZE, 1); struct watch_queue_buffer *buf mmap(NULL, 1 * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, wfd, 0); .EE .in .PP From this point, the buffer is open for business. Filters can be set to restrict the notifications that get inserted into the buffer from the sources that are watched. For example: .PP .in +4n .EX static struct watch_notification_filter filter = { .nr_filters = 2, .__reserved = 0, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = 1 << NOTIFY_KEY_LINKED, .info_filter = 1 << WATCH_INFO_FLAG_2, .info_mask = 1 << WATCH_INFO_FLAG_2, }, [1] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = 1 << NOTIFY_USB_DEVICE_ADD, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); .EE .in .PP will only allow key-change notifications that indicate a key is linked into a keyring and then only if type-specific flag WATCH_INFO_FLAG_2 is set on the notification and will only allow USB device-add notifications, blocking other USB notifications and all block device notifications. .PP Sources can then be watched, for example: .PP .in +4n .EX keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, wfd, 0x33); watch_devices(wfd, 0x55, 0); .EE .in .PP The first places a watch on the process's session keyring, directing the notifications to the buffer we just created and specifying that they should be tagged with 0x33 in the info ID field. The second places a watch on the global device notifications queue, specifying that notifications from that should be tagged with info ID 0x55. .PP The device file descriptor can then be polled to find out when the kernel writes something into the buffer or if the ring indices become incoherent: .PP .in +4n .EX struct pollfd p[1]; p[0].fd = wfd; p[0].events = POLLIN | POLLERR; p[0].revents = 0; poll(p, 1, -1); .EE .in .PP When it is determined that there is something in the buffer, messages can be read out of the ring with something like the following: .PP .in +4n .EX struct watch_notification *n; unsigned int len, head, tail, mask = buf->meta.mask; while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), tail = buf->meta.tail, tail != head ) { n = &buf->slots[tail & mask]; len = n->info & WATCH_INFO_LENGTH; len >>= WATCH_INFO_LENGTH__SHIFT; if (len = 0) abort(); switch (n->type) { case WATCH_TYPE_META: switch (n->subtype) { case WATCH_META_REMOVAL_NOTIFICATION: saw_removal_notification(n); break; } break; case WATCH_TYPE_KEY_NOTIFY: saw_key_change(n); break; case WATCH_TYPE_USB_NOTIFY: saw_usb_event(n); break; } tail += len; __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); } .EE .in .PP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH VERSIONS The notification queue driver first appeared in v??? of the Linux kernel. .SH SEE ALSO .ad l .nh .BR ioctl (2), .BR keyctl (1), .BR keyctl_watch_key (3), .BR poll (2), .BR setrlimit (2) ^ permalink raw reply [flat|nested] 234+ messages in thread
* watch_devices(2) manpage 2019-08-30 13:57 ` David Howells @ 2019-08-30 14:15 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:15 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel '\" t .\" Copyright (c) 2019 David Howells <dhowells@redhat.com> .\" .\" %%%LICENSE_START(VERBATIM) .\" Permission is granted to make and distribute verbatim copies of this .\" manual provided the copyright notice and this permission notice are .\" preserved on all copies. .\" .\" Permission is granted to copy and distribute modified versions of this .\" manual under the conditions for verbatim copying, provided that the .\" entire resulting derived work is distributed under the terms of a .\" permission notice identical to this one. .\" .\" Since the Linux kernel and libraries are constantly changing, this .\" manual page may be incorrect or out-of-date. The author(s) assume no .\" responsibility for errors or omissions, or for damages resulting from .\" the use of the information contained herein. The author(s) may not .\" have taken the same level of care in the production of this manual, .\" which is licensed free of charge, as they might when working .\" professionally. .\" .\" Formatted or processed versions of this manual, if unaccompanied by .\" the source, must acknowledge the copyright and authors of this work. .\" %%%LICENSE_END .\" .TH WATCH_DEVICES 2 2019-08-29 "Linux" "Linux Programmer's Manual" .SH NAME watch_devices \- Watch for global device notifications .SH SYNOPSIS .nf .B #include <linux/watch_queue.h> .br .B #include <unistd.h> .br .BI "int watch_devices(int " watch_fd ", int " watch_id ", unsigned int " flags ); .fi .PP .IR Note : There are no glibc wrappers for these system calls. .SH DESCRIPTION .PP .BR watch_devices () attaches a watch on the global device notification source to a previously opened and configured watch queue. See .BR watch_queue (7) for more information on how to set up and use those. .PP The global device notification source is provided with events from a number of sources, including block device errors and USB events. Each notification type has a specific format. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Block Device Notifications Events on block devices, such as I/O errors are posted to any watching queues. The message format is: .PP .in +4n .EX struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; }; .EE .in .PP The .I watch.type field will be set to .BR WATCH_TYPE_BLOCK_NOTIFY , the .I watch.subtype field will contain a constant that indicates the particular event that occurred and the watch_id passed to watch_devices() will be placed in .I watch.info in the ID field. .PP .I dev will contain the major and minor device numbers in .B dev_t form and .I sector will contain the first sector the notification pertains to. .PP The following events are defined: .PP .in +4n .TS lB l. NOTIFY_BLOCK_ERROR_TIMEOUT NOTIFY_BLOCK_ERROR_NO_SPACE NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT NOTIFY_BLOCK_ERROR_CRITICAL_TARGET NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM NOTIFY_BLOCK_ERROR_PROTECTION NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE NOTIFY_BLOCK_ERROR_IO .TE .in .PP All of which indicate error conditions. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS USB Device Notifications Events on USB devices, such as I/O errors are posted to any watching queues. The message format is: .PP .in +4n .EX struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; }; .EE .in .PP The .I watch.type field will be set to .BR WATCH_TYPE_USB_NOTIFY , the .I watch.subtype field will contain a constant that indicates the particular event that occurred and the watch_id passed to watch_devices() will be placed in .I watch.info in the ID field. .PP .IR name " and " name_len indicates the textual name of the USB device that originated the notification. The name will be truncated to .B USB_NOTIFICATION_MAX_NAME_LEN if it is longer than that. .PP The following subtypes are currently defined: .TP .B NOTIFY_USB_DEVICE_ADD A new USB device has been plugged in. .TP .B NOTIFY_USB_DEVICE_REMOVE A USB device has been unplugged. .TP .B NOTIFY_USB_BUS_ADD A new USB bus is now available. .TP .B NOTIFY_USB_BUS_REMOVE A USB bus has been removed. .TP .B NOTIFY_USB_DEVICE_RESET A USB device has been reset. .TP .B NOTIFY_USB_DEVICE_ERROR A USB device has generated an error; a suitable error code will have been placed in .IR error . .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RETURN VALUE On success, the function returns 0. On error, \-1 is returned, and .I errno is set appropriately. .SH ERRORS The following errors may be returned: .TP .B EBADF .I watch_fd is an invalid file descriptor. .TP .B EBADSLT The watch does not exist and so cannot be removed. .TP .B EBUSY The source is already attached to the watch device instance specified by .I watch_fd and so cannot be added. .TP .B EINVAL .I watch_fd does not refer to a watch_queue device file. .TP .B EINVAL .IR watch_fd " or " watch_id is out of range. .TP .B EINVAL Unsupported .I flags set. .TP .B ENOMEM Insufficient memory available to allocate a watch record. .TP .B EPERM The caller does not have the required privileges. .SH CONFORMING TO These functions are Linux-specific and should not be used in programs intended to be portable. .SH VERSIONS The notification queue driver first appeared in v??? of the Linux kernel. .SH NOTES Glibc does not (yet) provide a wrapper for the .BR watch_devices "()" system call; call it using .BR syscall (2). .SH SEE ALSO .BR watch_queue (7) ^ permalink raw reply [flat|nested] 234+ messages in thread
* watch_devices(2) manpage @ 2019-08-30 14:15 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:15 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel '\" t .\" Copyright (c) 2019 David Howells <dhowells@redhat.com> .\" .\" %%%LICENSE_START(VERBATIM) .\" Permission is granted to make and distribute verbatim copies of this .\" manual provided the copyright notice and this permission notice are .\" preserved on all copies. .\" .\" Permission is granted to copy and distribute modified versions of this .\" manual under the conditions for verbatim copying, provided that the .\" entire resulting derived work is distributed under the terms of a .\" permission notice identical to this one. .\" .\" Since the Linux kernel and libraries are constantly changing, this .\" manual page may be incorrect or out-of-date. The author(s) assume no .\" responsibility for errors or omissions, or for damages resulting from .\" the use of the information contained herein. The author(s) may not .\" have taken the same level of care in the production of this manual, .\" which is licensed free of charge, as they might when working .\" professionally. .\" .\" Formatted or processed versions of this manual, if unaccompanied by .\" the source, must acknowledge the copyright and authors of this work. .\" %%%LICENSE_END .\" .TH WATCH_DEVICES 2 2019-08-29 "Linux" "Linux Programmer's Manual" .SH NAME watch_devices \- Watch for global device notifications .SH SYNOPSIS .nf .B #include <linux/watch_queue.h> .br .B #include <unistd.h> .br .BI "int watch_devices(int " watch_fd ", int " watch_id ", unsigned int " flags ); .fi .PP .IR Note : There are no glibc wrappers for these system calls. .SH DESCRIPTION .PP .BR watch_devices () attaches a watch on the global device notification source to a previously opened and configured watch queue. See .BR watch_queue (7) for more information on how to set up and use those. .PP The global device notification source is provided with events from a number of sources, including block device errors and USB events. Each notification type has a specific format. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Block Device Notifications Events on block devices, such as I/O errors are posted to any watching queues. The message format is: .PP .in +4n .EX struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; }; .EE .in .PP The .I watch.type field will be set to .BR WATCH_TYPE_BLOCK_NOTIFY , the .I watch.subtype field will contain a constant that indicates the particular event that occurred and the watch_id passed to watch_devices() will be placed in .I watch.info in the ID field. .PP .I dev will contain the major and minor device numbers in .B dev_t form and .I sector will contain the first sector the notification pertains to. .PP The following events are defined: .PP .in +4n .TS lB l. NOTIFY_BLOCK_ERROR_TIMEOUT NOTIFY_BLOCK_ERROR_NO_SPACE NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT NOTIFY_BLOCK_ERROR_CRITICAL_TARGET NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM NOTIFY_BLOCK_ERROR_PROTECTION NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE NOTIFY_BLOCK_ERROR_IO .TE .in .PP All of which indicate error conditions. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS USB Device Notifications Events on USB devices, such as I/O errors are posted to any watching queues. The message format is: .PP .in +4n .EX struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; }; .EE .in .PP The .I watch.type field will be set to .BR WATCH_TYPE_USB_NOTIFY , the .I watch.subtype field will contain a constant that indicates the particular event that occurred and the watch_id passed to watch_devices() will be placed in .I watch.info in the ID field. .PP .IR name " and " name_len indicates the textual name of the USB device that originated the notification. The name will be truncated to .B USB_NOTIFICATION_MAX_NAME_LEN if it is longer than that. .PP The following subtypes are currently defined: .TP .B NOTIFY_USB_DEVICE_ADD A new USB device has been plugged in. .TP .B NOTIFY_USB_DEVICE_REMOVE A USB device has been unplugged. .TP .B NOTIFY_USB_BUS_ADD A new USB bus is now available. .TP .B NOTIFY_USB_BUS_REMOVE A USB bus has been removed. .TP .B NOTIFY_USB_DEVICE_RESET A USB device has been reset. .TP .B NOTIFY_USB_DEVICE_ERROR A USB device has generated an error; a suitable error code will have been placed in .IR error . .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RETURN VALUE On success, the function returns 0. On error, \-1 is returned, and .I errno is set appropriately. .SH ERRORS The following errors may be returned: .TP .B EBADF .I watch_fd is an invalid file descriptor. .TP .B EBADSLT The watch does not exist and so cannot be removed. .TP .B EBUSY The source is already attached to the watch device instance specified by .I watch_fd and so cannot be added. .TP .B EINVAL .I watch_fd does not refer to a watch_queue device file. .TP .B EINVAL .IR watch_fd " or " watch_id is out of range. .TP .B EINVAL Unsupported .I flags set. .TP .B ENOMEM Insufficient memory available to allocate a watch record. .TP .B EPERM The caller does not have the required privileges. .SH CONFORMING TO These functions are Linux-specific and should not be used in programs intended to be portable. .SH VERSIONS The notification queue driver first appeared in v??? of the Linux kernel. .SH NOTES Glibc does not (yet) provide a wrapper for the .BR watch_devices "()" system call; call it using .BR syscall (2). .SH SEE ALSO .BR watch_queue (7) ^ permalink raw reply [flat|nested] 234+ messages in thread
* keyctl_watch_key.3 manpage 2019-08-30 13:57 ` David Howells @ 2019-08-30 14:16 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:16 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel .\" .\" Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. .\" Written by David Howells (dhowells@redhat.com) .\" .\" This program is free software; you can redistribute it and/or .\" modify it under the terms of the GNU General Public License .\" as published by the Free Software Foundation; either version .\" 2 of the License, or (at your option) any later version. .\" .TH KEYCTL_GRANT_PERMISSION 3 "28 Aug 2019" Linux "Linux Key Management Calls" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH NAME keyctl_watch_key \- Watch for changes to a key .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SYNOPSIS .nf .B #include <keyutils.h> .sp .BI "long keyctl_watch_key(key_serial_t " key , .BI " int " watch_queue_fd .BI " int " watch_id ");" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH DESCRIPTION .BR keyctl_watch_key () sets or removes a watch on .IR key . .PP .I watch_id specifies the ID for a watch that will be included in notification messages. It can be between 0 and 255 to add a key; it should be -1 to remove a key. .PP .I watch_queue_fd is a file descriptor attached to a watch_queue device instance. Multiple openings of a device provide separate instances. Each device instance can only have one watch on any particular key. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Notification Record .PP Key-specific notification messages that the kernel emits into the buffer have the following format: .PP .in +4n .EX struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; }; .EE .in .PP The .I watch.type field will be set to .B WATCH_TYPE_KEY_NOTIFY and the .I watch.subtype field will contain one of the following constants, indicating the event that occurred and the watch_id passed to keyctl_watch_key() will be placed in .I watch.info in the ID field. The following events are defined: .TP .B NOTIFY_KEY_INSTANTIATED This indicates that a watched key got instantiated or negatively instantiated. .I key_id indicates the key that was instantiated and .I aux is unused. .TP .B NOTIFY_KEY_UPDATED This indicates that a watched key got updated or instantiated by update. .I key_id indicates the key that was updated and .I aux is unused. .TP .B NOTIFY_KEY_LINKED This indicates that a key got linked into a watched keyring. .I key_id indicates the keyring that was modified .I aux indicates the key that was added. .TP .B NOTIFY_KEY_UNLINKED This indicates that a key got unlinked from a watched keyring. .I key_id indicates the keyring that was modified .I aux indicates the key that was removed. .TP .B NOTIFY_KEY_CLEARED This indicates that a watched keyring got cleared. .I key_id indicates the keyring that was cleared and .I aux is unused. .TP .B NOTIFY_KEY_REVOKED This indicates that a watched key got revoked. .I key_id indicates the key that was revoked and .I aux is unused. .TP .B NOTIFY_KEY_INVALIDATED This indicates that a watched key got invalidated. .I key_id indicates the key that was invalidated and .I aux is unused. .TP .B NOTIFY_KEY_SETATTR This indicates that a watched key had its attributes (owner, group, permissions, timeout) modified. .I key_id indicates the key that was modified and .I aux is unused. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Removal Notification When a watched key is garbage collected, all of its watches are automatically destroyed and a notification is delivered to each watcher. This will normally be an extended notification of the form: .PP .in +4n .EX struct watch_notification_removal { struct watch_notification watch; __u64 id; }; .EE .in .PP The .I watch.type field will be set to .B WATCH_TYPE_META and the .I watch.subtype field will contain .BR WATCH_META_REMOVAL_NOTIFICATION . If the extended notification is given, then the length will be 2 units, otherwise it will be 1 and only the header will be present. .PP The watch_id passed to .IR keyctl_watch_key () will be placed in .I watch.info in the ID field. .PP If the extension is present, .I id will be set to the ID of the destroyed key. .PP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RETURN VALUE On success .BR keyctl_watch_key () returns .B 0 . On error, the value .B -1 will be returned and .I errno will have been set to an appropriate error. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH ERRORS .TP .B ENOKEY The specified key does not exist. .TP .B EKEYEXPIRED The specified key has expired. .TP .B EKEYREVOKED The specified key has been revoked. .TP .B EACCES The named key exists, but does not grant .B view permission to the calling process. .TP .B EBUSY The specified key already has a watch on it for that device instance (add only). .TP .B EBADSLT The specified key doesn't have a watch on it (removal only). .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH LINKING This is a library function that can be found in .IR libkeyutils . When linking, .B \-lkeyutils should be specified to the linker. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SEE ALSO .ad l .nh .BR keyctl (1), .BR add_key (2), .BR keyctl (2), .BR request_key (2), .BR keyctl (3), .BR keyrings (7), .BR keyutils (7) ^ permalink raw reply [flat|nested] 234+ messages in thread
* keyctl_watch_key.3 manpage @ 2019-08-30 14:16 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-30 14:16 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel .\" .\" Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. .\" Written by David Howells (dhowells@redhat.com) .\" .\" This program is free software; you can redistribute it and/or .\" modify it under the terms of the GNU General Public License .\" as published by the Free Software Foundation; either version .\" 2 of the License, or (at your option) any later version. .\" .TH KEYCTL_GRANT_PERMISSION 3 "28 Aug 2019" Linux "Linux Key Management Calls" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH NAME keyctl_watch_key \- Watch for changes to a key .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SYNOPSIS .nf .B #include <keyutils.h> .sp .BI "long keyctl_watch_key(key_serial_t " key , .BI " int " watch_queue_fd .BI " int " watch_id ");" .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH DESCRIPTION .BR keyctl_watch_key () sets or removes a watch on .IR key . .PP .I watch_id specifies the ID for a watch that will be included in notification messages. It can be between 0 and 255 to add a key; it should be -1 to remove a key. .PP .I watch_queue_fd is a file descriptor attached to a watch_queue device instance. Multiple openings of a device provide separate instances. Each device instance can only have one watch on any particular key. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Notification Record .PP Key-specific notification messages that the kernel emits into the buffer have the following format: .PP .in +4n .EX struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; }; .EE .in .PP The .I watch.type field will be set to .B WATCH_TYPE_KEY_NOTIFY and the .I watch.subtype field will contain one of the following constants, indicating the event that occurred and the watch_id passed to keyctl_watch_key() will be placed in .I watch.info in the ID field. The following events are defined: .TP .B NOTIFY_KEY_INSTANTIATED This indicates that a watched key got instantiated or negatively instantiated. .I key_id indicates the key that was instantiated and .I aux is unused. .TP .B NOTIFY_KEY_UPDATED This indicates that a watched key got updated or instantiated by update. .I key_id indicates the key that was updated and .I aux is unused. .TP .B NOTIFY_KEY_LINKED This indicates that a key got linked into a watched keyring. .I key_id indicates the keyring that was modified .I aux indicates the key that was added. .TP .B NOTIFY_KEY_UNLINKED This indicates that a key got unlinked from a watched keyring. .I key_id indicates the keyring that was modified .I aux indicates the key that was removed. .TP .B NOTIFY_KEY_CLEARED This indicates that a watched keyring got cleared. .I key_id indicates the keyring that was cleared and .I aux is unused. .TP .B NOTIFY_KEY_REVOKED This indicates that a watched key got revoked. .I key_id indicates the key that was revoked and .I aux is unused. .TP .B NOTIFY_KEY_INVALIDATED This indicates that a watched key got invalidated. .I key_id indicates the key that was invalidated and .I aux is unused. .TP .B NOTIFY_KEY_SETATTR This indicates that a watched key had its attributes (owner, group, permissions, timeout) modified. .I key_id indicates the key that was modified and .I aux is unused. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SS Removal Notification When a watched key is garbage collected, all of its watches are automatically destroyed and a notification is delivered to each watcher. This will normally be an extended notification of the form: .PP .in +4n .EX struct watch_notification_removal { struct watch_notification watch; __u64 id; }; .EE .in .PP The .I watch.type field will be set to .B WATCH_TYPE_META and the .I watch.subtype field will contain .BR WATCH_META_REMOVAL_NOTIFICATION . If the extended notification is given, then the length will be 2 units, otherwise it will be 1 and only the header will be present. .PP The watch_id passed to .IR keyctl_watch_key () will be placed in .I watch.info in the ID field. .PP If the extension is present, .I id will be set to the ID of the destroyed key. .PP .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH RETURN VALUE On success .BR keyctl_watch_key () returns .B 0 . On error, the value .B -1 will be returned and .I errno will have been set to an appropriate error. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH ERRORS .TP .B ENOKEY The specified key does not exist. .TP .B EKEYEXPIRED The specified key has expired. .TP .B EKEYREVOKED The specified key has been revoked. .TP .B EACCES The named key exists, but does not grant .B view permission to the calling process. .TP .B EBUSY The specified key already has a watch on it for that device instance (add only). .TP .B EBADSLT The specified key doesn't have a watch on it (removal only). .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH LINKING This is a library function that can be found in .IR libkeyutils . When linking, .B \-lkeyutils should be specified to the linker. .\""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""" .SH SEE ALSO .ad l .nh .BR keyctl (1), .BR add_key (2), .BR keyctl (2), .BR request_key (2), .BR keyctl (3), .BR keyrings (7), .BR keyutils (7) ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] 2019-08-30 13:57 ` David Howells @ 2019-08-30 22:09 ` Casey Schaufler -1 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-08-30 22:09 UTC (permalink / raw) To: David Howells, viro Cc: Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 8/30/2019 6:57 AM, David Howells wrote: > Here's a set of patches to add a general notification queue concept and to > add sources of events for: > > (1) Key/keyring events, such as creating, linking and removal of keys. > > (2) General device events (single common queue) including: > > - Block layer events, such as device errors > > - USB subsystem events, such as device/bus attach/remove, device > reset, device errors. > > Tests for the key/keyring events can be found on the keyutils next branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next I'm having trouble with the "make install" on Fedora. Is there an unusual dependency? > > Notifications are done automatically inside of the testing infrastructure > on every change to that every test makes to a key or keyring. > > Manual pages can be found there also, including pages for watch_queue(7) > and the watch_devices(2) system call (these should be transferred to the > manpages package if taken upstream). > > LSM hooks are included: > > (1) A set of hooks are provided that allow an LSM to rule on whether or > not a watch may be set. Each of these hooks takes a different > "watched object" parameter, so they're not really shareable. The LSM > should use current's credentials. [Wanted by SELinux & Smack] > > (2) A hook is provided to allow an LSM to rule on whether or not a > particular message may be posted to a particular queue. This is given > the credentials from the event generator (which may be the system) and > the watch setter. [Wanted by Smack] > > I've provided a preliminary attempt to provide SELinux and Smack with > implementations of some of these hooks. > > > Design decisions: > > (1) A misc chardev is used to create and open a ring buffer: > > fd = open("/dev/watch_queue", O_RDWR); > > which is then configured and mmap'd into userspace: > > ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); > ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); > buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, > MAP_SHARED, fd, 0); > > The fd cannot be read or written (though there is a facility to use > write to inject records for debugging) and userspace just pulls data > directly out of the buffer. > > (2) The ring index pointers are stored inside the ring and are thus > accessible to userspace. Userspace should only update the tail > pointer and never the head pointer or risk breaking the buffer. The > kernel checks that the pointers appear valid before trying to use > them. A 'skip' record is maintained around the pointers. > > (3) poll() can be used to wait for data to appear in the buffer. > > (4) Records in the buffer are binary, typed and have a length so that they > can be of varying size. > > This means that multiple heterogeneous sources can share a common > buffer. Tags may be specified when a watchpoint is created to help > distinguish the sources. > > (5) The queue is reusable as there are 16 million types available, of > which I've used just a few, so there is scope for others to be used. > > (6) Records are filterable as types have up to 256 subtypes that can be > individually filtered. Other filtration is also available. > > (7) Each time the buffer is opened, a new buffer is created - this means > that there's no interference between watchers. > > (8) When recording a notification, the kernel will not sleep, but will > rather mark a queue as overrun if there's insufficient space, thereby > avoiding userspace causing the kernel to hang. > > (9) The 'watchpoint' should be specific where possible, meaning that you > specify the object that you want to watch. > > (10) The buffer is created and then watchpoints are attached to it, using > one of: > > keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); > watch_devices(fd, 0x02, 0); > > where in both cases, fd indicates the queue and the number after is a > tag between 0 and 255. > > (11) The watch must be removed if either the watch buffer is destroyed or > the watched object is destroyed. > > > Things I want to avoid: > > (1) Introducing features that make the core VFS dependent on the network > stack or networking namespaces (ie. usage of netlink). > > (2) Dumping all this stuff into dmesg and having a daemon that sits there > parsing the output and distributing it as this then puts the > responsibility for security into userspace and makes handling > namespaces tricky. Further, dmesg might not exist or might be > inaccessible inside a container. > > (3) Letting users see events they shouldn't be able to see. > > > The patches can be found here also: > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core > > Changes: > > ver #7: > > (*) Removed the 'watch' argument from the security_watch_key() and > security_watch_devices() hooks as current_cred() can be used instead > of watch->cred. > > ver #6: > > (*) Fix mmap bug in watch_queue driver. > > (*) Add an extended removal notification that can transmit an identifier > to userspace (such as a key ID). > > (*) Don't produce a instantiation notification in mark_key_instantiated() > but rather do it in the caller to prevent key updates from producing > an instantiate notification as well as an update notification. > > (*) Set the right number of filters in the sample program. > > (*) Provide preliminary hook implementations for SELinux and Smack. > > ver #5: > > (*) Split the superblock watch and mount watch parts out into their own > branch (notifications-mount) as they really need certain fsinfo() > attributes. > > (*) Rearrange the watch notification UAPI header to push the length down > to bits 0-5 and remove the lost-message bits. The userspace's watch > ID tag is moved to bits 8-15 and then the message type is allocated > all of bits 16-31 for its own purposes. > > The lost-message bit is moved over to the header, rather than being > placed in the next message to be generated and given its own word so > it can be cleared with xchg(,0) for parisc. > > (*) The security_post_notification() hook is no longer called with the > spinlock held and softirqs disabled - though the RCU readlock is still > held. > > (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK > will skip the overuse check. > > (*) The buffer is marked VM_DONTEXPAND. > > (*) Save the watch-setter's creds in struct watch and give that to the LSM > hook for posting a message. > > ver #4: > > (*) Split the basic UAPI bits out into their own patch and then split the > LSM hooks out into an intermediate patch. Add LSM hooks for setting > watches. > > Rename the *_notify() system calls to watch_*() for consistency. > > ver #3: > > (*) I've added a USB notification source and reformulated the block > notification source so that there's now a common watch list, for which > the system call is now device_notify(). > > I've assigned a pair of unused ioctl numbers in the 'W' series to the > ioctls added by this series. > > I've also added a description of the kernel API to the documentation. > > ver #2: > > (*) I've fixed various issues raised by Jann Horn and GregKH and moved to > krefs for refcounting. I've added some security features to try and > give Casey Schaufler the LSM control he wants. > > David > --- > David Howells (11): > uapi: General notification ring definitions > security: Add hooks to rule on setting a watch > security: Add a hook for the point of notification insertion > General notification queue with user mmap()'able ring buffer > keys: Add a notification facility > Add a general, global device notification watch list > block: Add block layer notifications > usb: Add USB subsystem notifications > Add sample notification program > selinux: Implement the watch_key security hook > smack: Implement the watch_key and post_notification hooks [untested] > > > Documentation/ioctl/ioctl-number.rst | 1 > Documentation/security/keys/core.rst | 58 ++ > Documentation/watch_queue.rst | 460 ++++++++++++++ > arch/alpha/kernel/syscalls/syscall.tbl | 1 > arch/arm/tools/syscall.tbl | 1 > arch/ia64/kernel/syscalls/syscall.tbl | 1 > arch/m68k/kernel/syscalls/syscall.tbl | 1 > arch/microblaze/kernel/syscalls/syscall.tbl | 1 > arch/mips/kernel/syscalls/syscall_n32.tbl | 1 > arch/mips/kernel/syscalls/syscall_n64.tbl | 1 > arch/mips/kernel/syscalls/syscall_o32.tbl | 1 > arch/parisc/kernel/syscalls/syscall.tbl | 1 > arch/powerpc/kernel/syscalls/syscall.tbl | 1 > arch/s390/kernel/syscalls/syscall.tbl | 1 > arch/sh/kernel/syscalls/syscall.tbl | 1 > arch/sparc/kernel/syscalls/syscall.tbl | 1 > arch/x86/entry/syscalls/syscall_32.tbl | 1 > arch/x86/entry/syscalls/syscall_64.tbl | 1 > arch/xtensa/kernel/syscalls/syscall.tbl | 1 > block/Kconfig | 9 > block/blk-core.c | 29 + > drivers/base/Kconfig | 9 > drivers/base/Makefile | 1 > drivers/base/watch.c | 90 +++ > drivers/misc/Kconfig | 13 > drivers/misc/Makefile | 1 > drivers/misc/watch_queue.c | 893 +++++++++++++++++++++++++++ > drivers/usb/core/Kconfig | 9 > drivers/usb/core/devio.c | 56 ++ > drivers/usb/core/hub.c | 4 > include/linux/blkdev.h | 15 > include/linux/device.h | 7 > include/linux/key.h | 3 > include/linux/lsm_audit.h | 1 > include/linux/lsm_hooks.h | 38 + > include/linux/security.h | 32 + > include/linux/syscalls.h | 1 > include/linux/usb.h | 18 + > include/linux/watch_queue.h | 94 +++ > include/uapi/asm-generic/unistd.h | 4 > include/uapi/linux/keyctl.h | 2 > include/uapi/linux/watch_queue.h | 183 ++++++ > kernel/sys_ni.c | 1 > samples/Kconfig | 6 > samples/Makefile | 1 > samples/watch_queue/Makefile | 8 > samples/watch_queue/watch_test.c | 233 +++++++ > security/keys/Kconfig | 9 > security/keys/compat.c | 3 > security/keys/gc.c | 5 > security/keys/internal.h | 30 + > security/keys/key.c | 38 + > security/keys/keyctl.c | 99 +++ > security/keys/keyring.c | 20 - > security/keys/request_key.c | 4 > security/security.c | 23 + > security/selinux/hooks.c | 14 > security/smack/smack_lsm.c | 82 ++ > 58 files changed, 2593 insertions(+), 30 deletions(-) > create mode 100644 Documentation/watch_queue.rst > create mode 100644 drivers/base/watch.c > create mode 100644 drivers/misc/watch_queue.c > create mode 100644 include/linux/watch_queue.h > create mode 100644 include/uapi/linux/watch_queue.h > create mode 100644 samples/watch_queue/Makefile > create mode 100644 samples/watch_queue/watch_test.c > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] @ 2019-08-30 22:09 ` Casey Schaufler 0 siblings, 0 replies; 234+ messages in thread From: Casey Schaufler @ 2019-08-30 22:09 UTC (permalink / raw) To: David Howells, viro Cc: Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel, casey On 8/30/2019 6:57 AM, David Howells wrote: > Here's a set of patches to add a general notification queue concept and to > add sources of events for: > > (1) Key/keyring events, such as creating, linking and removal of keys. > > (2) General device events (single common queue) including: > > - Block layer events, such as device errors > > - USB subsystem events, such as device/bus attach/remove, device > reset, device errors. > > Tests for the key/keyring events can be found on the keyutils next branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next I'm having trouble with the "make install" on Fedora. Is there an unusual dependency? > > Notifications are done automatically inside of the testing infrastructure > on every change to that every test makes to a key or keyring. > > Manual pages can be found there also, including pages for watch_queue(7) > and the watch_devices(2) system call (these should be transferred to the > manpages package if taken upstream). > > LSM hooks are included: > > (1) A set of hooks are provided that allow an LSM to rule on whether or > not a watch may be set. Each of these hooks takes a different > "watched object" parameter, so they're not really shareable. The LSM > should use current's credentials. [Wanted by SELinux & Smack] > > (2) A hook is provided to allow an LSM to rule on whether or not a > particular message may be posted to a particular queue. This is given > the credentials from the event generator (which may be the system) and > the watch setter. [Wanted by Smack] > > I've provided a preliminary attempt to provide SELinux and Smack with > implementations of some of these hooks. > > > Design decisions: > > (1) A misc chardev is used to create and open a ring buffer: > > fd = open("/dev/watch_queue", O_RDWR); > > which is then configured and mmap'd into userspace: > > ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); > ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); > buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, > MAP_SHARED, fd, 0); > > The fd cannot be read or written (though there is a facility to use > write to inject records for debugging) and userspace just pulls data > directly out of the buffer. > > (2) The ring index pointers are stored inside the ring and are thus > accessible to userspace. Userspace should only update the tail > pointer and never the head pointer or risk breaking the buffer. The > kernel checks that the pointers appear valid before trying to use > them. A 'skip' record is maintained around the pointers. > > (3) poll() can be used to wait for data to appear in the buffer. > > (4) Records in the buffer are binary, typed and have a length so that they > can be of varying size. > > This means that multiple heterogeneous sources can share a common > buffer. Tags may be specified when a watchpoint is created to help > distinguish the sources. > > (5) The queue is reusable as there are 16 million types available, of > which I've used just a few, so there is scope for others to be used. > > (6) Records are filterable as types have up to 256 subtypes that can be > individually filtered. Other filtration is also available. > > (7) Each time the buffer is opened, a new buffer is created - this means > that there's no interference between watchers. > > (8) When recording a notification, the kernel will not sleep, but will > rather mark a queue as overrun if there's insufficient space, thereby > avoiding userspace causing the kernel to hang. > > (9) The 'watchpoint' should be specific where possible, meaning that you > specify the object that you want to watch. > > (10) The buffer is created and then watchpoints are attached to it, using > one of: > > keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); > watch_devices(fd, 0x02, 0); > > where in both cases, fd indicates the queue and the number after is a > tag between 0 and 255. > > (11) The watch must be removed if either the watch buffer is destroyed or > the watched object is destroyed. > > > Things I want to avoid: > > (1) Introducing features that make the core VFS dependent on the network > stack or networking namespaces (ie. usage of netlink). > > (2) Dumping all this stuff into dmesg and having a daemon that sits there > parsing the output and distributing it as this then puts the > responsibility for security into userspace and makes handling > namespaces tricky. Further, dmesg might not exist or might be > inaccessible inside a container. > > (3) Letting users see events they shouldn't be able to see. > > > The patches can be found here also: > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core > > Changes: > > ver #7: > > (*) Removed the 'watch' argument from the security_watch_key() and > security_watch_devices() hooks as current_cred() can be used instead > of watch->cred. > > ver #6: > > (*) Fix mmap bug in watch_queue driver. > > (*) Add an extended removal notification that can transmit an identifier > to userspace (such as a key ID). > > (*) Don't produce a instantiation notification in mark_key_instantiated() > but rather do it in the caller to prevent key updates from producing > an instantiate notification as well as an update notification. > > (*) Set the right number of filters in the sample program. > > (*) Provide preliminary hook implementations for SELinux and Smack. > > ver #5: > > (*) Split the superblock watch and mount watch parts out into their own > branch (notifications-mount) as they really need certain fsinfo() > attributes. > > (*) Rearrange the watch notification UAPI header to push the length down > to bits 0-5 and remove the lost-message bits. The userspace's watch > ID tag is moved to bits 8-15 and then the message type is allocated > all of bits 16-31 for its own purposes. > > The lost-message bit is moved over to the header, rather than being > placed in the next message to be generated and given its own word so > it can be cleared with xchg(,0) for parisc. > > (*) The security_post_notification() hook is no longer called with the > spinlock held and softirqs disabled - though the RCU readlock is still > held. > > (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK > will skip the overuse check. > > (*) The buffer is marked VM_DONTEXPAND. > > (*) Save the watch-setter's creds in struct watch and give that to the LSM > hook for posting a message. > > ver #4: > > (*) Split the basic UAPI bits out into their own patch and then split the > LSM hooks out into an intermediate patch. Add LSM hooks for setting > watches. > > Rename the *_notify() system calls to watch_*() for consistency. > > ver #3: > > (*) I've added a USB notification source and reformulated the block > notification source so that there's now a common watch list, for which > the system call is now device_notify(). > > I've assigned a pair of unused ioctl numbers in the 'W' series to the > ioctls added by this series. > > I've also added a description of the kernel API to the documentation. > > ver #2: > > (*) I've fixed various issues raised by Jann Horn and GregKH and moved to > krefs for refcounting. I've added some security features to try and > give Casey Schaufler the LSM control he wants. > > David > --- > David Howells (11): > uapi: General notification ring definitions > security: Add hooks to rule on setting a watch > security: Add a hook for the point of notification insertion > General notification queue with user mmap()'able ring buffer > keys: Add a notification facility > Add a general, global device notification watch list > block: Add block layer notifications > usb: Add USB subsystem notifications > Add sample notification program > selinux: Implement the watch_key security hook > smack: Implement the watch_key and post_notification hooks [untested] > > > Documentation/ioctl/ioctl-number.rst | 1 > Documentation/security/keys/core.rst | 58 ++ > Documentation/watch_queue.rst | 460 ++++++++++++++ > arch/alpha/kernel/syscalls/syscall.tbl | 1 > arch/arm/tools/syscall.tbl | 1 > arch/ia64/kernel/syscalls/syscall.tbl | 1 > arch/m68k/kernel/syscalls/syscall.tbl | 1 > arch/microblaze/kernel/syscalls/syscall.tbl | 1 > arch/mips/kernel/syscalls/syscall_n32.tbl | 1 > arch/mips/kernel/syscalls/syscall_n64.tbl | 1 > arch/mips/kernel/syscalls/syscall_o32.tbl | 1 > arch/parisc/kernel/syscalls/syscall.tbl | 1 > arch/powerpc/kernel/syscalls/syscall.tbl | 1 > arch/s390/kernel/syscalls/syscall.tbl | 1 > arch/sh/kernel/syscalls/syscall.tbl | 1 > arch/sparc/kernel/syscalls/syscall.tbl | 1 > arch/x86/entry/syscalls/syscall_32.tbl | 1 > arch/x86/entry/syscalls/syscall_64.tbl | 1 > arch/xtensa/kernel/syscalls/syscall.tbl | 1 > block/Kconfig | 9 > block/blk-core.c | 29 + > drivers/base/Kconfig | 9 > drivers/base/Makefile | 1 > drivers/base/watch.c | 90 +++ > drivers/misc/Kconfig | 13 > drivers/misc/Makefile | 1 > drivers/misc/watch_queue.c | 893 +++++++++++++++++++++++++++ > drivers/usb/core/Kconfig | 9 > drivers/usb/core/devio.c | 56 ++ > drivers/usb/core/hub.c | 4 > include/linux/blkdev.h | 15 > include/linux/device.h | 7 > include/linux/key.h | 3 > include/linux/lsm_audit.h | 1 > include/linux/lsm_hooks.h | 38 + > include/linux/security.h | 32 + > include/linux/syscalls.h | 1 > include/linux/usb.h | 18 + > include/linux/watch_queue.h | 94 +++ > include/uapi/asm-generic/unistd.h | 4 > include/uapi/linux/keyctl.h | 2 > include/uapi/linux/watch_queue.h | 183 ++++++ > kernel/sys_ni.c | 1 > samples/Kconfig | 6 > samples/Makefile | 1 > samples/watch_queue/Makefile | 8 > samples/watch_queue/watch_test.c | 233 +++++++ > security/keys/Kconfig | 9 > security/keys/compat.c | 3 > security/keys/gc.c | 5 > security/keys/internal.h | 30 + > security/keys/key.c | 38 + > security/keys/keyctl.c | 99 +++ > security/keys/keyring.c | 20 - > security/keys/request_key.c | 4 > security/security.c | 23 + > security/selinux/hooks.c | 14 > security/smack/smack_lsm.c | 82 ++ > 58 files changed, 2593 insertions(+), 30 deletions(-) > create mode 100644 Documentation/watch_queue.rst > create mode 100644 drivers/base/watch.c > create mode 100644 drivers/misc/watch_queue.c > create mode 100644 include/linux/watch_queue.h > create mode 100644 include/uapi/linux/watch_queue.h > create mode 100644 samples/watch_queue/Makefile > create mode 100644 samples/watch_queue/watch_test.c > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] 2019-08-30 13:57 ` David Howells @ 2019-09-02 12:39 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-02 12:39 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > > Tests for the key/keyring events can be found on the keyutils next branch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next > > I'm having trouble with the "make install" on Fedora. Is there an > unusual dependency? What's the symptom you're seeing? Is it this: install -D -m 0644 libkeyutils.a /tmp/opt/lib64 libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007f7dcbf6d000)/libkeyutils.a /bin/sh: -c: line 0: syntax error near unexpected token `(' /bin/sh: -c: line 0: `install -D -m 0644 libkeyutils.a /tmp/opt/lib64 libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007f7dcbf6d000)/libkeyutils.a' David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] @ 2019-09-02 12:39 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-02 12:39 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > > Tests for the key/keyring events can be found on the keyutils next branch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next > > I'm having trouble with the "make install" on Fedora. Is there an > unusual dependency? What's the symptom you're seeing? Is it this: install -D -m 0644 libkeyutils.a /tmp/opt/lib64 libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007f7dcbf6d000)/libkeyutils.a /bin/sh: -c: line 0: syntax error near unexpected token `(' /bin/sh: -c: line 0: `install -D -m 0644 libkeyutils.a /tmp/opt/lib64 libcrypt.so.2 => /lib64/libcrypt.so.2 (0x00007f7dcbf6d000)/libkeyutils.a' David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] 2019-08-30 13:57 ` David Howells @ 2019-09-02 13:26 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-02 13:26 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > > Tests for the key/keyring events can be found on the keyutils next branch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next > > I'm having trouble with the "make install" on Fedora. Is there an > unusual dependency? I've pushed a couple of patches to my next branch. Do "make install" and "make rpm" now work for you? David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] @ 2019-09-02 13:26 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-02 13:26 UTC (permalink / raw) To: Casey Schaufler Cc: dhowells, viro, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Casey Schaufler <casey@schaufler-ca.com> wrote: > > Tests for the key/keyring events can be found on the keyutils next branch: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next > > I'm having trouble with the "make install" on Fedora. Is there an > unusual dependency? I've pushed a couple of patches to my next branch. Do "make install" and "make rpm" now work for you? David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #7] 2019-08-30 13:57 ` David Howells @ 2019-09-03 16:06 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:06 UTC (permalink / raw) To: Hillf Danton Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Hillf Danton <hdanton@sina.com> wrote: > > + smp_store_release(&buf->meta.head, head); > > Add a line of comment for the paring smp_load_acquire(). > I did not find it in 04/11. You won't find smp_load_acquire() - it's not in the kernel, though if you look in the sample, you'll find the corresponding barrier in userspace. Note that there's a further implicit barrier you don't see. I've added the comments: /* Barrier against userspace, ordering data read before tail read */ ring_tail = READ_ONCE(buf->meta.tail); and: /* Barrier against userspace, ordering head update after data write. */ smp_store_release(&buf->meta.head, head); David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #7] @ 2019-09-03 16:06 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:06 UTC (permalink / raw) To: Hillf Danton Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Hillf Danton <hdanton@sina.com> wrote: > > + smp_store_release(&buf->meta.head, head); > > Add a line of comment for the paring smp_load_acquire(). > I did not find it in 04/11. You won't find smp_load_acquire() - it's not in the kernel, though if you look in the sample, you'll find the corresponding barrier in userspace. Note that there's a further implicit barrier you don't see. I've added the comments: /* Barrier against userspace, ordering data read before tail read */ ring_tail = READ_ONCE(buf->meta.tail); and: /* Barrier against userspace, ordering head update after data write. */ smp_store_release(&buf->meta.head, head); David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #7] 2019-08-30 13:57 ` David Howells @ 2019-09-03 16:37 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:37 UTC (permalink / raw) To: Hillf Danton Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Hillf Danton <hdanton@sina.com> wrote: > > + for (i = 0; i < wf->nr_filters; i++) { > > + wt = &wf->filters[i]; > > + if (n->type == wt->type && > > + (wt->subtype_filter[n->subtype >> 5] & > > + (1U << (n->subtype & 31))) && > > Replace the pure numbers with something easier to understand. How about the following: static bool filter_watch_notification(const struct watch_filter *wf, const struct watch_notification *n) { const struct watch_type_filter *wt; unsigned int st_bits = sizeof(wt->subtype_filter[0]) * 8; unsigned int st_index = n->subtype / st_bits; unsigned int st_bit = 1U << (n->subtype % st_bits); int i; if (!test_bit(n->type, wf->type_filter)) return false; for (i = 0; i < wf->nr_filters; i++) { wt = &wf->filters[i]; if (n->type == wt->type && (wt->subtype_filter[st_index] & st_bit) && (n->info & wt->info_mask) == wt->info_filter) return true; } return false; /* If there is a filter, the default is to reject. */ } David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #7] @ 2019-09-03 16:37 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-09-03 16:37 UTC (permalink / raw) To: Hillf Danton Cc: dhowells, viro, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Hillf Danton <hdanton@sina.com> wrote: > > + for (i = 0; i < wf->nr_filters; i++) { > > + wt = &wf->filters[i]; > > + if (n->type = wt->type && > > + (wt->subtype_filter[n->subtype >> 5] & > > + (1U << (n->subtype & 31))) && > > Replace the pure numbers with something easier to understand. How about the following: static bool filter_watch_notification(const struct watch_filter *wf, const struct watch_notification *n) { const struct watch_type_filter *wt; unsigned int st_bits = sizeof(wt->subtype_filter[0]) * 8; unsigned int st_index = n->subtype / st_bits; unsigned int st_bit = 1U << (n->subtype % st_bits); int i; if (!test_bit(n->type, wf->type_filter)) return false; for (i = 0; i < wf->nr_filters; i++) { wt = &wf->filters[i]; if (n->type = wt->type && (wt->subtype_filter[st_index] & st_bit) && (n->info & wt->info_mask) = wt->info_filter) return true; } return false; /* If there is a filter, the default is to reject. */ } David ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #6] @ 2019-08-29 18:29 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:29 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Here's a set of patches to add a general notification queue concept and to add sources of events for: (1) Key/keyring events, such as creating, linking and removal of keys. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device/bus attach/remove, device reset, device errors. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for watch_queue(7) and the watch_devices(2) system call (these should be transferred to the manpages package if taken upstream). LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided a preliminary attempt to provide SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written (though there is a facility to use write to inject records for debugging) and userspace just pulls data directly out of the buffer. (2) The ring index pointers are stored inside the ring and are thus accessible to userspace. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. A 'skip' record is maintained around the pointers. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This means that multiple heterogeneous sources can share a common buffer. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) The queue is reusable as there are 16 million types available, of which I've used just a few, so there is scope for others to be used. (6) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (7) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (8) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. (9) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (10) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (11) The watch must be removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks [untested] Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 94 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 892 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 56 ++ drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 32 + include/linux/security.h | 25 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 183 ++++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 233 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 103 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 19 + security/selinux/hooks.c | 17 + security/smack/smack_lsm.c | 81 ++ 58 files changed, 2587 insertions(+), 28 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #6] @ 2019-08-29 18:29 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:29 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Here's a set of patches to add a general notification queue concept and to add sources of events for: (1) Key/keyring events, such as creating, linking and removal of keys. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device/bus attach/remove, device reset, device errors. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for watch_queue(7) and the watch_devices(2) system call (these should be transferred to the manpages package if taken upstream). LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided a preliminary attempt to provide SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written (though there is a facility to use write to inject records for debugging) and userspace just pulls data directly out of the buffer. (2) The ring index pointers are stored inside the ring and are thus accessible to userspace. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. A 'skip' record is maintained around the pointers. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This means that multiple heterogeneous sources can share a common buffer. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) The queue is reusable as there are 16 million types available, of which I've used just a few, so there is scope for others to be used. (6) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (7) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (8) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. (9) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (10) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (11) The watch must be removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks [untested] Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 94 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 892 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 56 ++ drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 32 + include/linux/security.h | 25 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 183 ++++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 233 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 103 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 19 + security/selinux/hooks.c | 17 + security/smack/smack_lsm.c | 81 ++ 58 files changed, 2587 insertions(+), 28 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 00/11] Keyrings, Block and USB notifications [ver #6] @ 2019-08-29 18:29 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:29 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Here's a set of patches to add a general notification queue concept and to add sources of events for: (1) Key/keyring events, such as creating, linking and removal of keys. (2) General device events (single common queue) including: - Block layer events, such as device errors - USB subsystem events, such as device/bus attach/remove, device reset, device errors. Tests for the key/keyring events can be found on the keyutils next branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=next Notifications are done automatically inside of the testing infrastructure on every change to that every test makes to a key or keyring. Manual pages can be found there also, including pages for watch_queue(7) and the watch_devices(2) system call (these should be transferred to the manpages package if taken upstream). LSM hooks are included: (1) A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] (2) A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided a preliminary attempt to provide SELinux and Smack with implementations of some of these hooks. Design decisions: (1) A misc chardev is used to create and open a ring buffer: fd = open("/dev/watch_queue", O_RDWR); which is then configured and mmap'd into userspace: ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); The fd cannot be read or written (though there is a facility to use write to inject records for debugging) and userspace just pulls data directly out of the buffer. (2) The ring index pointers are stored inside the ring and are thus accessible to userspace. Userspace should only update the tail pointer and never the head pointer or risk breaking the buffer. The kernel checks that the pointers appear valid before trying to use them. A 'skip' record is maintained around the pointers. (3) poll() can be used to wait for data to appear in the buffer. (4) Records in the buffer are binary, typed and have a length so that they can be of varying size. This means that multiple heterogeneous sources can share a common buffer. Tags may be specified when a watchpoint is created to help distinguish the sources. (5) The queue is reusable as there are 16 million types available, of which I've used just a few, so there is scope for others to be used. (6) Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. (7) Each time the buffer is opened, a new buffer is created - this means that there's no interference between watchers. (8) When recording a notification, the kernel will not sleep, but will rather mark a queue as overrun if there's insufficient space, thereby avoiding userspace causing the kernel to hang. (9) The 'watchpoint' should be specific where possible, meaning that you specify the object that you want to watch. (10) The buffer is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); watch_devices(fd, 0x02, 0); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. (11) The watch must be removed if either the watch buffer is destroyed or the watched object is destroyed. Things I want to avoid: (1) Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). (2) Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. (3) Letting users see events they shouldn't be able to see. The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=notifications-core Changes: ver #6: (*) Fix mmap bug in watch_queue driver. (*) Add an extended removal notification that can transmit an identifier to userspace (such as a key ID). (*) Don't produce a instantiation notification in mark_key_instantiated() but rather do it in the caller to prevent key updates from producing an instantiate notification as well as an update notification. (*) Set the right number of filters in the sample program. (*) Provide preliminary hook implementations for SELinux and Smack. ver #5: (*) Split the superblock watch and mount watch parts out into their own branch (notifications-mount) as they really need certain fsinfo() attributes. (*) Rearrange the watch notification UAPI header to push the length down to bits 0-5 and remove the lost-message bits. The userspace's watch ID tag is moved to bits 8-15 and then the message type is allocated all of bits 16-31 for its own purposes. The lost-message bit is moved over to the header, rather than being placed in the next message to be generated and given its own word so it can be cleared with xchg(,0) for parisc. (*) The security_post_notification() hook is no longer called with the spinlock held and softirqs disabled - though the RCU readlock is still held. (*) Buffer pages are now accounted towards RLIMIT_MEMLOCK and CAP_IPC_LOCK will skip the overuse check. (*) The buffer is marked VM_DONTEXPAND. (*) Save the watch-setter's creds in struct watch and give that to the LSM hook for posting a message. ver #4: (*) Split the basic UAPI bits out into their own patch and then split the LSM hooks out into an intermediate patch. Add LSM hooks for setting watches. Rename the *_notify() system calls to watch_*() for consistency. ver #3: (*) I've added a USB notification source and reformulated the block notification source so that there's now a common watch list, for which the system call is now device_notify(). I've assigned a pair of unused ioctl numbers in the 'W' series to the ioctls added by this series. I've also added a description of the kernel API to the documentation. ver #2: (*) I've fixed various issues raised by Jann Horn and GregKH and moved to krefs for refcounting. I've added some security features to try and give Casey Schaufler the LSM control he wants. David --- David Howells (11): uapi: General notification ring definitions security: Add hooks to rule on setting a watch security: Add a hook for the point of notification insertion General notification queue with user mmap()'able ring buffer keys: Add a notification facility Add a general, global device notification watch list block: Add block layer notifications usb: Add USB subsystem notifications Add sample notification program selinux: Implement the watch_key security hook smack: Implement the watch_key and post_notification hooks [untested] Documentation/ioctl/ioctl-number.rst | 1 Documentation/security/keys/core.rst | 58 ++ Documentation/watch_queue.rst | 460 ++++++++++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 block/Kconfig | 9 block/blk-core.c | 29 + drivers/base/Kconfig | 9 drivers/base/Makefile | 1 drivers/base/watch.c | 94 +++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 892 +++++++++++++++++++++++++++ drivers/usb/core/Kconfig | 9 drivers/usb/core/devio.c | 56 ++ drivers/usb/core/hub.c | 4 include/linux/blkdev.h | 15 include/linux/device.h | 7 include/linux/key.h | 3 include/linux/lsm_audit.h | 1 include/linux/lsm_hooks.h | 32 + include/linux/security.h | 25 + include/linux/syscalls.h | 1 include/linux/usb.h | 18 + include/linux/watch_queue.h | 94 +++ include/uapi/asm-generic/unistd.h | 4 include/uapi/linux/keyctl.h | 2 include/uapi/linux/watch_queue.h | 183 ++++++ kernel/sys_ni.c | 1 samples/Kconfig | 6 samples/Makefile | 1 samples/watch_queue/Makefile | 8 samples/watch_queue/watch_test.c | 233 +++++++ security/keys/Kconfig | 9 security/keys/compat.c | 3 security/keys/gc.c | 5 security/keys/internal.h | 30 + security/keys/key.c | 38 + security/keys/keyctl.c | 103 +++ security/keys/keyring.c | 20 - security/keys/request_key.c | 4 security/security.c | 19 + security/selinux/hooks.c | 17 + security/smack/smack_lsm.c | 81 ++ 58 files changed, 2587 insertions(+), 28 deletions(-) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/base/watch.c create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h create mode 100644 include/uapi/linux/watch_queue.h create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:29 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:29 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #6] @ 2019-08-29 18:29 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:29 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 01/11] uapi: General notification ring definitions [ver #6] @ 2019-08-29 18:29 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:29 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add UAPI definitions for the general notification ring, including the following pieces: (1) struct watch_notification. This is the metadata header for each entry in the ring. It includes a type and subtype that indicate the source of the message (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message (eg. NOTIFY_MOUNT_NEW_MOUNT). The header also contains an information field that conveys the following information: - WATCH_INFO_LENGTH. The size of the entry (entries are variable length). - WATCH_INFO_ID. The watch ID specified when the watchpoint was set. - WATCH_INFO_TYPE_INFO. (Sub)type-specific information. - WATCH_INFO_FLAG_*. Flag bits overlain on the type-specific information. For use by the type. All the information in the header can be used in filtering messages at the point of writing into the buffer. (2) struct watch_queue_buffer. This describes the layout of the ring. Note that the first slots in the ring contain a special metadata entry that contains the ring pointers. The producer in the kernel knows to skip this and it has a proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that indicates the size so that the ring consumer can handle it the same as any other record and just skip it. Note that this means that ring entries can never be split over the end of the ring, so if an entry would need to be split, a skip record is inserted to wrap the ring first; this is also WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION. (3) WATCH_INFO_NOTIFICATIONS_LOST. This is a flag that can be set in the metadata header by the kernel to indicate that at least one message was lost since it was last cleared by userspace. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- include/uapi/linux/watch_queue.h | 67 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/uapi/linux/watch_queue.h diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h new file mode 100644 index 000000000000..70f575099968 --- /dev/null +++ b/include/uapi/linux/watch_queue.h @@ -0,0 +1,67 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_WATCH_QUEUE_H +#define _UAPI_LINUX_WATCH_QUEUE_H + +#include <linux/types.h> + +enum watch_notification_type { + WATCH_TYPE_META = 0, /* Special record */ + WATCH_TYPE___NR = 1 +}; + +enum watch_meta_notification_subtype { + WATCH_META_SKIP_NOTIFICATION = 0, /* Just skip this record */ + WATCH_META_REMOVAL_NOTIFICATION = 1, /* Watched object was removed */ +}; + +#define WATCH_LENGTH_GRANULARITY sizeof(__u64) + +/* + * Notification record header. This is aligned to 64-bits so that subclasses + * can contain __u64 fields. + */ +struct watch_notification { + __u32 type:24; /* enum watch_notification_type */ + __u32 subtype:8; /* Type-specific subtype (filterable) */ + __u32 info; +#define WATCH_INFO_LENGTH 0x0000003f /* Length of record / sizeof(watch_notification) */ +#define WATCH_INFO_LENGTH__SHIFT 0 +#define WATCH_INFO_ID 0x0000ff00 /* ID of watchpoint, if type-appropriate */ +#define WATCH_INFO_ID__SHIFT 8 +#define WATCH_INFO_TYPE_INFO 0xffff0000 /* Type-specific info */ +#define WATCH_INFO_TYPE_INFO__SHIFT 16 +#define WATCH_INFO_FLAG_0 0x00010000 /* Type-specific info, flag bit 0 */ +#define WATCH_INFO_FLAG_1 0x00020000 /* ... */ +#define WATCH_INFO_FLAG_2 0x00040000 +#define WATCH_INFO_FLAG_3 0x00080000 +#define WATCH_INFO_FLAG_4 0x00100000 +#define WATCH_INFO_FLAG_5 0x00200000 +#define WATCH_INFO_FLAG_6 0x00400000 +#define WATCH_INFO_FLAG_7 0x00800000 +} __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +struct watch_queue_buffer { + union { + /* The first few entries are special, containing the + * ring management variables. + */ + struct { + struct watch_notification watch; /* WATCH_TYPE_META */ + __u32 head; /* Ring head index */ + __u32 tail; /* Ring tail index */ + __u32 mask; /* Ring index mask */ + __u32 __reserved; + } meta; + struct watch_notification slots[0]; + }; +}; + +/* + * The Metadata pseudo-notification message uses a flag bits in the information + * field to convey the fact that messages have been lost. We can only use a + * single bit in this manner per word as some arches that support SMP + * (eg. parisc) have no kernel<->user atomic bit ops. + */ +#define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 + +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:30 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 22 ++++++++++++++++++++++ include/linux/security.h | 15 +++++++++++++++ security/security.c | 13 +++++++++++++ 3 files changed, 50 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..19108185b254 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,20 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @watch: The watch object + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * @watch: The watch object + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1702,10 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_WATCH_QUEUE + int (*watch_key)(struct watch *watch, struct key *key); + int (*watch_devices)(struct watch *watch); +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,10 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head watch_key; + struct hlist_head watch_devices; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..feeade454308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,7 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -392,6 +393,10 @@ void security_inode_invalidate_secctx(struct inode *inode); int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen); int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen); int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_WATCH_QUEUE +int security_watch_key(struct watch *watch, struct key *key); +int security_watch_devices(struct watch *watch); +#endif /* CONFIG_WATCH_QUEUE */ #else /* CONFIG_SECURITY */ static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data) @@ -1204,6 +1209,16 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 { return -EOPNOTSUPP; } +#ifdef CONFIG_WATCH_QUEUE +static inline int security_watch_key(struct watch *watch, struct key *key) +{ + return 0; +} +static inline int security_watch_devices(struct watch *watch) +{ + return 0; +} +#endif /* CONFIG_WATCH_QUEUE */ #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK diff --git a/security/security.c b/security/security.c index 250ee2d76406..1ebd2c936a57 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,19 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_watch_key(struct watch *watch, struct key *key) +{ + return call_int_hook(watch_key, 0, watch, key); +} + +int security_watch_devices(struct watch *watch) +{ + return call_int_hook(watch_devices, 0, watch); +} + +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 22 ++++++++++++++++++++++ include/linux/security.h | 15 +++++++++++++++ security/security.c | 13 +++++++++++++ 3 files changed, 50 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..19108185b254 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,20 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @watch: The watch object + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * @watch: The watch object + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1702,10 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_WATCH_QUEUE + int (*watch_key)(struct watch *watch, struct key *key); + int (*watch_devices)(struct watch *watch); +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,10 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head watch_key; + struct hlist_head watch_devices; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..feeade454308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,7 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -392,6 +393,10 @@ void security_inode_invalidate_secctx(struct inode *inode); int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen); int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen); int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_WATCH_QUEUE +int security_watch_key(struct watch *watch, struct key *key); +int security_watch_devices(struct watch *watch); +#endif /* CONFIG_WATCH_QUEUE */ #else /* CONFIG_SECURITY */ static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data) @@ -1204,6 +1209,16 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 { return -EOPNOTSUPP; } +#ifdef CONFIG_WATCH_QUEUE +static inline int security_watch_key(struct watch *watch, struct key *key) +{ + return 0; +} +static inline int security_watch_devices(struct watch *watch) +{ + return 0; +} +#endif /* CONFIG_WATCH_QUEUE */ #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK diff --git a/security/security.c b/security/security.c index 250ee2d76406..1ebd2c936a57 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,19 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_watch_key(struct watch *watch, struct key *key) +{ + return call_int_hook(watch_key, 0, watch, key); +} + +int security_watch_devices(struct watch *watch) +{ + return call_int_hook(watch_devices, 0, watch); +} + +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 02/11] security: Add hooks to rule on setting a watch [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add security hooks that will allow an LSM to rule on whether or not a watch may be set. More than one hook is required as the watches watch different types of object. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 22 ++++++++++++++++++++++ include/linux/security.h | 15 +++++++++++++++ security/security.c | 13 +++++++++++++ 3 files changed, 50 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index df1318d85f7d..19108185b254 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1413,6 +1413,20 @@ * @ctx is a pointer in which to place the allocated security context. * @ctxlen points to the place to put the length of @ctx. * + * Security hooks for the general notification queue: + * + * @watch_key: + * Check to see if a process is allowed to watch for event notifications + * from a key or keyring. + * @watch: The watch object + * @key: The key to watch. + * + * @watch_devices: + * Check to see if a process is allowed to watch for event notifications + * from devices (as a global set). + * @watch: The watch object + * + * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. * @@ -1688,6 +1702,10 @@ union security_list_options { int (*inode_notifysecctx)(struct inode *inode, void *ctx, u32 ctxlen); int (*inode_setsecctx)(struct dentry *dentry, void *ctx, u32 ctxlen); int (*inode_getsecctx)(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_WATCH_QUEUE + int (*watch_key)(struct watch *watch, struct key *key); + int (*watch_devices)(struct watch *watch); +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK int (*unix_stream_connect)(struct sock *sock, struct sock *other, @@ -1964,6 +1982,10 @@ struct security_hook_heads { struct hlist_head inode_notifysecctx; struct hlist_head inode_setsecctx; struct hlist_head inode_getsecctx; +#ifdef CONFIG_WATCH_QUEUE + struct hlist_head watch_key; + struct hlist_head watch_devices; +#endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; struct hlist_head unix_may_send; diff --git a/include/linux/security.h b/include/linux/security.h index 5f7441abbf42..feeade454308 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -57,6 +57,7 @@ struct mm_struct; struct fs_context; struct fs_parameter; enum fs_value_type; +struct watch; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -392,6 +393,10 @@ void security_inode_invalidate_secctx(struct inode *inode); int security_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen); int security_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen); int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); +#ifdef CONFIG_WATCH_QUEUE +int security_watch_key(struct watch *watch, struct key *key); +int security_watch_devices(struct watch *watch); +#endif /* CONFIG_WATCH_QUEUE */ #else /* CONFIG_SECURITY */ static inline int call_blocking_lsm_notifier(enum lsm_event event, void *data) @@ -1204,6 +1209,16 @@ static inline int security_inode_getsecctx(struct inode *inode, void **ctx, u32 { return -EOPNOTSUPP; } +#ifdef CONFIG_WATCH_QUEUE +static inline int security_watch_key(struct watch *watch, struct key *key) +{ + return 0; +} +static inline int security_watch_devices(struct watch *watch) +{ + return 0; +} +#endif /* CONFIG_WATCH_QUEUE */ #endif /* CONFIG_SECURITY */ #ifdef CONFIG_SECURITY_NETWORK diff --git a/security/security.c b/security/security.c index 250ee2d76406..1ebd2c936a57 100644 --- a/security/security.c +++ b/security/security.c @@ -1916,6 +1916,19 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen) } EXPORT_SYMBOL(security_inode_getsecctx); +#ifdef CONFIG_WATCH_QUEUE +int security_watch_key(struct watch *watch, struct key *key) +{ + return call_int_hook(watch_key, 0, watch, key); +} + +int security_watch_devices(struct watch *watch) +{ + return call_int_hook(watch_devices, 0, watch); +} + +#endif /* CONFIG_WATCH_QUEUE */ + #ifdef CONFIG_SECURITY_NETWORK int security_unix_stream_connect(struct sock *sock, struct sock *other, struct sock *newsk) ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:30 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 10 ++++++++++ include/linux/security.h | 10 ++++++++++ security/security.c | 6 ++++++ 3 files changed, 26 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 19108185b254..e9f1f69cd04d 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1426,6 +1426,12 @@ * from devices (as a global set). * @watch: The watch object * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1705,6 +1711,9 @@ union security_list_options { #ifdef CONFIG_WATCH_QUEUE int (*watch_key)(struct watch *watch, struct key *key); int (*watch_devices)(struct watch *watch); + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK @@ -1985,6 +1994,7 @@ struct security_hook_heads { #ifdef CONFIG_WATCH_QUEUE struct hlist_head watch_key; struct hlist_head watch_devices; + struct hlist_head post_notification; #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; diff --git a/include/linux/security.h b/include/linux/security.h index feeade454308..003437714eee 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -58,6 +58,7 @@ struct fs_context; struct fs_parameter; enum fs_value_type; struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -396,6 +397,9 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); #ifdef CONFIG_WATCH_QUEUE int security_watch_key(struct watch *watch, struct key *key); int security_watch_devices(struct watch *watch); +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); #endif /* CONFIG_WATCH_QUEUE */ #else /* CONFIG_SECURITY */ @@ -1218,6 +1222,12 @@ static inline int security_watch_devices(struct watch *watch) { return 0; } +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} #endif /* CONFIG_WATCH_QUEUE */ #endif /* CONFIG_SECURITY */ diff --git a/security/security.c b/security/security.c index 1ebd2c936a57..5afe966aea4e 100644 --- a/security/security.c +++ b/security/security.c @@ -1927,6 +1927,12 @@ int security_watch_devices(struct watch *watch) return call_int_hook(watch_devices, 0, watch); } +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 10 ++++++++++ include/linux/security.h | 10 ++++++++++ security/security.c | 6 ++++++ 3 files changed, 26 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 19108185b254..e9f1f69cd04d 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1426,6 +1426,12 @@ * from devices (as a global set). * @watch: The watch object * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1705,6 +1711,9 @@ union security_list_options { #ifdef CONFIG_WATCH_QUEUE int (*watch_key)(struct watch *watch, struct key *key); int (*watch_devices)(struct watch *watch); + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK @@ -1985,6 +1994,7 @@ struct security_hook_heads { #ifdef CONFIG_WATCH_QUEUE struct hlist_head watch_key; struct hlist_head watch_devices; + struct hlist_head post_notification; #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; diff --git a/include/linux/security.h b/include/linux/security.h index feeade454308..003437714eee 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -58,6 +58,7 @@ struct fs_context; struct fs_parameter; enum fs_value_type; struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -396,6 +397,9 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); #ifdef CONFIG_WATCH_QUEUE int security_watch_key(struct watch *watch, struct key *key); int security_watch_devices(struct watch *watch); +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); #endif /* CONFIG_WATCH_QUEUE */ #else /* CONFIG_SECURITY */ @@ -1218,6 +1222,12 @@ static inline int security_watch_devices(struct watch *watch) { return 0; } +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} #endif /* CONFIG_WATCH_QUEUE */ #endif /* CONFIG_SECURITY */ diff --git a/security/security.c b/security/security.c index 1ebd2c936a57..5afe966aea4e 100644 --- a/security/security.c +++ b/security/security.c @@ -1927,6 +1927,12 @@ int security_watch_devices(struct watch *watch) return call_int_hook(watch_devices, 0, watch); } +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 03/11] security: Add a hook for the point of notification insertion [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a security hook that allows an LSM to rule on whether a notification message is allowed to be inserted into a particular watch queue. The hook is given the following information: (1) The credentials of the triggerer (which may be init_cred for a system notification, eg. a hardware error). (2) The credentials of the whoever set the watch. (3) The notification message. Signed-off-by: David Howells <dhowells@redhat.com> cc: Casey Schaufler <casey@schaufler-ca.com> cc: Stephen Smalley <sds@tycho.nsa.gov> cc: linux-security-module@vger.kernel.org --- include/linux/lsm_hooks.h | 10 ++++++++++ include/linux/security.h | 10 ++++++++++ security/security.c | 6 ++++++ 3 files changed, 26 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 19108185b254..e9f1f69cd04d 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1426,6 +1426,12 @@ * from devices (as a global set). * @watch: The watch object * + * @post_notification: + * Check to see if a watch notification can be posted to a particular + * queue. + * @w_cred: The credentials of the whoever set the watch. + * @cred: The event-triggerer's credentials + * @n: The notification being posted * * Security hooks for using the eBPF maps and programs functionalities through * eBPF syscalls. @@ -1705,6 +1711,9 @@ union security_list_options { #ifdef CONFIG_WATCH_QUEUE int (*watch_key)(struct watch *watch, struct key *key); int (*watch_devices)(struct watch *watch); + int (*post_notification)(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK @@ -1985,6 +1994,7 @@ struct security_hook_heads { #ifdef CONFIG_WATCH_QUEUE struct hlist_head watch_key; struct hlist_head watch_devices; + struct hlist_head post_notification; #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK struct hlist_head unix_stream_connect; diff --git a/include/linux/security.h b/include/linux/security.h index feeade454308..003437714eee 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -58,6 +58,7 @@ struct fs_context; struct fs_parameter; enum fs_value_type; struct watch; +struct watch_notification; /* Default (no) options for the capable function */ #define CAP_OPT_NONE 0x0 @@ -396,6 +397,9 @@ int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen); #ifdef CONFIG_WATCH_QUEUE int security_watch_key(struct watch *watch, struct key *key); int security_watch_devices(struct watch *watch); +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n); #endif /* CONFIG_WATCH_QUEUE */ #else /* CONFIG_SECURITY */ @@ -1218,6 +1222,12 @@ static inline int security_watch_devices(struct watch *watch) { return 0; } +static inline int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return 0; +} #endif /* CONFIG_WATCH_QUEUE */ #endif /* CONFIG_SECURITY */ diff --git a/security/security.c b/security/security.c index 1ebd2c936a57..5afe966aea4e 100644 --- a/security/security.c +++ b/security/security.c @@ -1927,6 +1927,12 @@ int security_watch_devices(struct watch *watch) return call_int_hook(watch_devices, 0, watch); } +int security_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + return call_int_hook(post_notification, 0, w_cred, cred, n); +} #endif /* CONFIG_WATCH_QUEUE */ #ifdef CONFIG_SECURITY_NETWORK ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:30 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 892 ++++++++++++++++++++++++++++++++++ include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 7 files changed, 1464 insertions(+) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============================ +Mappable notifications queue +============================ + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +======== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +================ + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +============== + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" == "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +==================================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +===================================== + +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +====================== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +======================== + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +============= + +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=============== + +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) == info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +================== + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +====================== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len == 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..287e7631feaf --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,892 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + ring_tail = READ_ONCE(buf->meta.tail); + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h == 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h == tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + smp_store_release(&buf->meta.head, head); + if (used == 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type == wt->type && + (wt->subtype_filter[n->subtype >> 5] & + (1U << (n->subtype & 31))) && + (n->info & wt->info_mask) == wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) == 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages == 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages == 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters == 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue = + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue == wq && watch->id == w->id) + return -EBUSY; + } + + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id == id && rcu_access_pointer(watch->queue) == wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op == &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 892 ++++++++++++++++++++++++++++++++++ include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 7 files changed, 1464 insertions(+) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============================ +Mappable notifications queue +============================ + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +======== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +================ + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +============== + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" == "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +==================================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +===================================== + +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +====================== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +======================== + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +============= + +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=============== + +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) == info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +================== + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +====================== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len == 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..287e7631feaf --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,892 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + ring_tail = READ_ONCE(buf->meta.tail); + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h == 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h == tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + smp_store_release(&buf->meta.head, head); + if (used == 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type == wt->type && + (wt->subtype_filter[n->subtype >> 5] & + (1U << (n->subtype & 31))) && + (n->info & wt->info_mask) == wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) == 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages == 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages == 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters == 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue = + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue == wq && watch->id == w->id) + return -EBUSY; + } + + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id == id && rcu_access_pointer(watch->queue) == wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op == &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 04/11] General notification queue with user mmap()'able ring buffer [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement a misc device that implements a general notification queue as a ring buffer that can be mmap()'d from userspace. The way this is done is: (1) An application opens the device and indicates the size of the ring buffer that it wants to reserve in pages (this can only be set once): fd = open("/dev/watch_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_NR_PAGES, nr_of_pages); (2) The application should then map the pages that the device has reserved. Each instance of the device created by open() allocates separate pages so that maps of different fds don't interfere with one another. Multiple mmap() calls on the same fd, however, will all work together. page_size = sysconf(_SC_PAGESIZE); mapping_size = nr_of_pages * page_size; char *buf = mmap(NULL, mapping_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); The ring is divided into 8-byte slots. Entries written into the ring are variable size and can use between 1 and 63 slots. A special entry is maintained in the first two slots of the ring that contains the head and tail pointers. This is skipped when the ring wraps round. Note that multislot entries, therefore, aren't allowed to be broken over the end of the ring, but instead "skip" entries are inserted to pad out the buffer. Each entry has a 1-slot header that describes it: struct watch_notification { __u32 type:24; __u32 subtype:8; __u32 info; }; The type indicates the source (eg. mount tree changes, superblock events, keyring changes, block layer events) and the subtype indicates the event type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field indicates a number of things, including the entry length, an ID assigned to a watchpoint contributing to this buffer, type-specific flags and meta flags, such as an overrun indicator. Supplementary data, such as the key ID that generated an event, are attached in additional slots. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> --- Documentation/ioctl/ioctl-number.rst | 1 Documentation/watch_queue.rst | 429 ++++++++++++++++ drivers/misc/Kconfig | 13 drivers/misc/Makefile | 1 drivers/misc/watch_queue.c | 892 ++++++++++++++++++++++++++++++++++ include/linux/watch_queue.h | 94 ++++ include/uapi/linux/watch_queue.h | 34 + 7 files changed, 1464 insertions(+) create mode 100644 Documentation/watch_queue.rst create mode 100644 drivers/misc/watch_queue.c create mode 100644 include/linux/watch_queue.h diff --git a/Documentation/ioctl/ioctl-number.rst b/Documentation/ioctl/ioctl-number.rst index 7f8dcae7a230..8141ccf2c53a 100644 --- a/Documentation/ioctl/ioctl-number.rst +++ b/Documentation/ioctl/ioctl-number.rst @@ -202,6 +202,7 @@ Code Seq# Include File Comments 'W' 00-1F linux/wanrouter.h conflict! (pre 3.9) 'W' 00-3F sound/asound.h conflict! 'W' 40-5F drivers/pci/switch/switchtec.c +'W' 60-61 linux/watch_queue.h 'X' all fs/xfs/xfs_fs.h, conflict! fs/xfs/linux-2.6/xfs_ioctl32.h, include/linux/falloc.h, diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst new file mode 100644 index 000000000000..6fb3aa3356d3 --- /dev/null +++ b/Documentation/watch_queue.rst @@ -0,0 +1,429 @@ +============== +Mappable notifications queue +============== + +This is a misc device that acts as a mapped ring buffer by which userspace can +receive notifications from the kernel. This can be used in conjunction with:: + + * Key/keyring notifications + + * General device event notifications + + +The notifications buffers can be enabled by: + + "Device Drivers"/"Misc devices"/"Mappable notification queue" + (CONFIG_WATCH_QUEUE) + +This document has the following sections: + +.. contents:: :local: + + +Overview +==== + +This facility appears as a misc device file that is opened and then mapped and +polled. Each time it is opened, it creates a new buffer specific to the +returned file descriptor. Then, when the opening process sets watches, it +indicates the particular buffer it wants notifications from that watch to be +written into. Note that there are no read() and write() methods (except for +debugging). The user is expected to access the ring directly and to use poll +to wait for new data. + +If a watch is in place, notifications are only written into the buffer if the +filter criteria are passed and if there's sufficient space available in the +ring. If neither of those is so, a notification will be discarded. In the +latter case, an overrun indicator will also be set. + +Note that when producing a notification, the kernel does not wait for the +consumers to collect it, but rather just continues on. This means that +notifications can be generated whilst spinlocks are held and also protects the +kernel from being held up indefinitely by a userspace malfunction. + +As far as the ring goes, the head index belongs to the kernel and the tail +index belongs to userspace. The kernel will refuse to write anything if the +tail index becomes invalid. Userspace *must* use appropriate memory barriers +between reading or updating the tail index and reading the ring. + + +Record Structure +======== + +Notification records in the ring may occupy a variable number of slots within +the buffer, beginning with a 1-slot header:: + + struct watch_notification { + __u32 type:24; + __u32 subtype:8; + __u32 info; + } __attribute__((aligned(WATCH_LENGTH_GRANULARITY))); + +"type" indicates the source of the notification record and "subtype" indicates +the type of record from that source (see the Watch Sources section below). The +type may also be "WATCH_TYPE_META". This is a special record type generated +internally by the watch queue driver itself. There are two subtypes, one of +which indicates records that should be just skipped (padding or metadata): + + * WATCH_META_SKIP_NOTIFICATION + * WATCH_META_REMOVAL_NOTIFICATION + +The former indicates a record that should just be skipped and the latter +indicates that an object on which a watch was installed was removed or +destroyed. + +"info" indicates a bunch of things, including: + + * The length of the record in units of buffer slots (mask with + WATCH_INFO_LENGTH and shift by WATCH_INFO_LENGTH__SHIFT). This indicates + the size of the record, which may be between 1 and 63 slots. To turn this + into a number of bytes, multiply by WATCH_LENGTH_GRANULARITY. + + * The watch ID (mask with WATCH_INFO_ID and shift by WATCH_INFO_ID__SHIFT). + This indicates that caller's ID of the watch, which may be between 0 + and 255. Multiple watches may share a queue, and this provides a means to + distinguish them. + + * In the metadata header in slot 0, a flag (WATCH_INFO_NOTIFICATIONS_LOST) + that indicates that some notifications were lost for some reason, including + buffer overrun, insufficient memory and inconsistent tail index. + + * A type-specific field (WATCH_INFO_TYPE_INFO). This is set by the + notification producer to indicate some meaning specific to the type and + subtype. + +Everything in info apart from the length can be used for filtering. + + +Ring Structure +======= + +The ring is divided into slots of size WATCH_LENGTH_GRANULARITY (8 bytes). The +caller uses an ioctl() to set the size of the ring after opening and this must +be a power-of-2 multiple of the system page size (so that the mask can be used +with AND). + +The head and tail indices are stored in the first two slots in the ring, which +are marked out as a skippable entry:: + + struct watch_queue_buffer { + union { + struct { + struct watch_notification watch; + volatile __u32 head; + volatile __u32 tail; + __u32 mask; + } meta; + struct watch_notification slots[0]; + }; + }; + +In "meta.watch", type will be set to WATCH_TYPE_META and subtype to +WATCH_META_SKIP_NOTIFICATION so that anyone processing the buffer will just +skip this record. Also, because this record is here, records cannot wrap round +the end of the buffer, so a skippable padding element will be inserted at the +end of the buffer if needed. Thus the contents of a notification record in the +buffer are always contiguous. + +"meta.mask" is an AND'able mask to turn the index counters into slots array +indices. + +The buffer is empty if "meta.head" = "meta.tail". + +[!] NOTE that the ring indices "meta.head" and "meta.tail" are indices into +"slots[]" not byte offsets into the buffer. + +[!] NOTE that userspace must never change the head pointer. This belongs to +the kernel and will be updated by that. The kernel will never change the tail +pointer. + +[!] NOTE that userspace must never AND-off the tail pointer before updating it, +but should just keep adding to it and letting it wrap naturally. The value +*should* be masked off when used as an index into slots[]. + +[!] NOTE that if the distance between head and tail becomes too great, the +kernel will assume the buffer is full and write no more until the issue is +resolved. + + +Watch List (Notification Source) API +================== + +A "watch list" is a list of watchers that are subscribed to a source of +notifications. A list may be attached to an object (say a key or a superblock) +or may be global (say for device events). From a userspace perspective, a +non-global watch list is typically referred to by reference to the object it +belongs to (such as using KEYCTL_NOTIFY and giving it a key serial number to +watch that specific key). + +To manage a watch list, the following functions are provided: + + * ``void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *wlist));`` + + Initialise a watch list. If ``release_watch`` is not NULL, then this + indicates a function that should be called when the watch_list object is + destroyed to discard any references the watch list holds on the watched + object. + + * ``void remove_watch_list(struct watch_list *wlist);`` + + This removes all of the watches subscribed to a watch_list and frees them + and then destroys the watch_list object itself. + + +Watch Queue (Notification Buffer) API +==================+ +A "watch queue" is the buffer allocated by or on behalf of the application that +notification records will be written into. The workings of this are hidden +entirely inside of the watch_queue device driver, but it is necessary to gain a +reference to it to place a watch. These can be managed with: + + * ``struct watch_queue *get_watch_queue(int fd);`` + + Since watch queues are indicated to the kernel by the fd of the character + device that implements the buffer, userspace must hand that fd through a + system call. This can be used to look up an opaque pointer to the watch + queue from the system call. + + * ``void put_watch_queue(struct watch_queue *wqueue);`` + + This discards the reference obtained from ``get_watch_queue()``. + + +Watch Subscription API +=========== + +A "watch" is a subscription on a watch list, indicating the watch queue, and +thus the buffer, into which notification records should be written. The watch +queue object may also carry filtering rules for that object, as set by +userspace. Some parts of the watch struct can be set by the driver:: + + struct watch { + union { + u32 info_id; /* ID to be OR'd in to info field */ + ... + }; + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + ... + }; + +The ``info_id`` value should be an 8-bit number obtained from userspace and +shifted by WATCH_INFO_ID__SHIFT. This is OR'd into the WATCH_INFO_ID field of +struct watch_notification::info when and if the notification is written into +the associated watch queue buffer. + +The ``private`` field is the driver's data associated with the watch_list and +is cleaned up by the ``watch_list::release_watch()`` method. + +The ``id`` field is the source's ID. Notifications that are posted with a +different ID are ignored. + +The following functions are provided to manage watches: + + * ``void init_watch(struct watch *watch, struct watch_queue *wqueue);`` + + Initialise a watch object, setting its pointer to the watch queue, using + appropriate barriering to avoid lockdep complaints. + + * ``int add_watch_to_object(struct watch *watch, struct watch_list *wlist);`` + + Subscribe a watch to a watch list (notification source). The + driver-settable fields in the watch struct must have been set before this + is called. + + * ``int remove_watch_from_object(struct watch_list *wlist, + struct watch_queue *wqueue, + u64 id, false);`` + + Remove a watch from a watch list, where the watch must match the specified + watch queue (``wqueue``) and object identifier (``id``). A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue to + indicate that the watch got removed. + + * ``int remove_watch_from_object(struct watch_list *wlist, NULL, 0, true);`` + + Remove all the watches from a watch list. It is expected that this will be + called preparatory to destruction and that the watch list will be + inaccessible to new watches by this point. A notification + (``WATCH_META_REMOVAL_NOTIFICATION``) is sent to the watch queue of each + subscribed watch to indicate that the watch got removed. + + +Notification Posting API +============ + +To post a notification to watch list so that the subscribed watches can see it, +the following function should be used:: + + void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id); + +The notification should be preformatted and a pointer to the header (``n``) +should be passed in. The notification may be larger than this and the size in +units of buffer slots is noted in ``n->info & WATCH_INFO_LENGTH``. + +The ``cred`` struct indicates the credentials of the source (subject) and is +passed to the LSMs, such as SELinux, to allow or suppress the recording of the +note in each individual queue according to the credentials of that queue +(object). + +The ``id`` is the ID of the source object (such as the serial number on a key). +Only watches that have the same ID set in them will see this notification. + + +Watch Sources +======+ +Any particular buffer can be fed from multiple sources. Sources include: + + * WATCH_TYPE_KEY_NOTIFY + + Notifications of this type indicate changes to keys and keyrings, including + the changes of keyring contents or the attributes of keys. + + See Documentation/security/keys/core.rst for more information. + + * WATCH_TYPE_BLOCK_NOTIFY + + Notifications of this type indicate block layer events, such as I/O errors + or temporary link loss. Watches of this type are set on a global queue. + + +Event Filtering +=======+ +Once a watch queue has been created, a set of filters can be applied to limit +the events that are received using:: + + struct watch_notification_filter filter = { + ... + }; + ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) + +The filter description is a variable of type:: + + struct watch_notification_filter { + __u32 nr_filters; + __u32 __reserved; + struct watch_notification_type_filter filters[]; + }; + +Where "nr_filters" is the number of filters in filters[] and "__reserved" +should be 0. The "filters" array has elements of the following type:: + + struct watch_notification_type_filter { + __u32 type; + __u32 info_filter; + __u32 info_mask; + __u32 subtype_filter[8]; + }; + +Where: + + * ``type`` is the event type to filter for and should be something like + "WATCH_TYPE_KEY_NOTIFY" + + * ``info_filter`` and ``info_mask`` act as a filter on the info field of the + notification record. The notification is only written into the buffer if:: + + (watch.info & info_mask) = info_filter + + This could be used, for example, to ignore events that are not exactly on + the watched point in a mount tree. + + * ``subtype_filter`` is a bitmask indicating the subtypes that are of + interest. Bit 0 of subtype_filter[0] corresponds to subtype 0, bit 1 to + subtype 1, and so on. + +If the argument to the ioctl() is NULL, then the filters will be removed and +all events from the watched sources will come through. + + +Waiting For Events +========= + +The file descriptor that holds the buffer may be used with poll() and similar. +POLLIN and POLLRDNORM are set if the buffer indices differ. POLLERR is set if +the buffer indices are further apart than the size of the buffer. Wake-up +events are only generated if the buffer is transitioned from an empty state. + + +Userspace Code Example +=========== + +A buffer is created with something like the following:: + + fd = open("/dev/watch_queue", O_RDWR); + + #define BUF_SIZE 4 + ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, + PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + +It can then be set to receive keyring change notifications and device event +notifications:: + + keyctl(KEYCTL_WATCH_KEY, KEY_SPEC_SESSION_KEYRING, fd, 0x01); + + watch_devices(fd, 0x2); + +The notifications can then be consumed by something like the following:: + + extern void saw_key_change(struct watch_notification *n); + extern void saw_block_event(struct watch_notification *n); + extern void saw_usb_event(struct watch_notification *n); + + static int consumer(int fd, struct watch_queue_buffer *buf) + { + struct watch_notification *n; + struct pollfd p[1]; + unsigned int len, head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) = -1 || p[0].revents & POLLERR) + goto went_wrong; + + while (head = _atomic_load_acquire(buf->meta.head), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + len = (n->info & WATCH_INFO_LENGTH) >> + WATCH_INFO_LENGTH__SHIFT; + if (len = 0) + goto went_wrong; + + switch (n->type) { + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_event(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += len; + _atomic_store_release(buf->meta.tail, tail); + } + } + + went_wrong: + return 0; + } + +Note the memory barriers when loading the head pointer and storing the tail +pointer! diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 16900357afc2..09d7677e8df0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -5,6 +5,19 @@ menu "Misc devices" +config WATCH_QUEUE + bool "Mappable notification queue" + default n + depends on MMU + help + This is a general notification queue for the kernel to pass events to + userspace through a mmap()'able ring buffer. It can be used in + conjunction with watches for key/keyring change notifications and device + notifications. + + Note that in theory this should work fine with NOMMU, but I'm not + sure how to make that work. + config SENSORS_LIS3LV02D tristate depends on INPUT diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index abd8ae249746..d36b14a5cb79 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -3,6 +3,7 @@ # Makefile for misc devices that really don't fit anywhere else. # +obj-$(CONFIG_WATCH_QUEUE) += watch_queue.o obj-$(CONFIG_IBM_ASM) += ibmasm/ obj-$(CONFIG_IBMVMC) += ibmvmc.o obj-$(CONFIG_AD525X_DPOT) += ad525x_dpot.o diff --git a/drivers/misc/watch_queue.c b/drivers/misc/watch_queue.c new file mode 100644 index 000000000000..287e7631feaf --- /dev/null +++ b/drivers/misc/watch_queue.c @@ -0,0 +1,892 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#define pr_fmt(fmt) "watchq: " fmt +#include <linux/module.h> +#include <linux/init.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/printk.h> +#include <linux/miscdevice.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/poll.h> +#include <linux/uaccess.h> +#include <linux/vmalloc.h> +#include <linux/file.h> +#include <linux/security.h> +#include <linux/cred.h> +#include <linux/sched/signal.h> +#include <linux/watch_queue.h> + +MODULE_DESCRIPTION("Watch queue"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +struct watch_type_filter { + enum watch_notification_type type; + __u32 subtype_filter[1]; /* Bitmask of subtypes to filter on */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ +}; + +struct watch_filter { + union { + struct rcu_head rcu; + unsigned long type_filter[2]; /* Bitmask of accepted types */ + }; + u32 nr_filters; /* Number of filters */ + struct watch_type_filter filters[]; +}; + +struct watch_queue { + struct rcu_head rcu; + struct address_space mapping; + struct user_struct *owner; /* Owner of the queue for rlimit purposes */ + struct watch_filter __rcu *filter; + wait_queue_head_t waiters; + struct hlist_head watches; /* Contributory watches */ + struct kref usage; /* Object usage count */ + spinlock_t lock; + bool defunct; /* T when queues closed */ + u8 nr_pages; /* Size of pages[] */ + u8 flag_next; /* Flag to apply to next item */ + u32 size; + struct watch_queue_buffer *buffer; /* Pointer to first record */ + + /* The mappable pages. The zeroth page holds the ring pointers. */ + struct page **pages; +}; + +/* + * Write a notification of an event into an mmap'd queue and let the user know. + * Returns true if successful and false on failure (eg. buffer overrun or + * userspace mucked up the ring indices). + */ +static bool write_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + struct watch_queue_buffer *buf = wqueue->buffer; + struct watch_notification *p; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + unsigned int size = wqueue->size, mask = size - 1; + unsigned int len; + unsigned int ring_tail, tail, head, used, gap, h; + + ring_tail = READ_ONCE(buf->meta.tail); + head = READ_ONCE(buf->meta.head); + used = head - ring_tail; + + /* Check to see if userspace mucked up the pointers */ + if (used >= size) + goto lost_event; /* Inconsistent */ + tail = ring_tail & mask; + if (tail > 0 && tail < metalen) + goto lost_event; /* Inconsistent */ + + len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + h = head & mask; + if (h >= tail) { + /* Head is at or after tail in the buffer. There may then be + * two gaps: one to the end of buffer and one at the beginning + * of the buffer between the metadata block and the tail + * pointer. + */ + gap = size - h; + if (len > gap) { + /* Not enough space in the post-head gap; we need to + * wrap. When wrapping, we will have to skip the + * metadata at the beginning of the buffer. + */ + if (len > tail - metalen) + goto lost_event; /* Overrun */ + + /* Fill the space at the end of the page */ + p = &buf->slots[h]; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = gap << WATCH_INFO_LENGTH__SHIFT; + head += gap; + h = 0; + if (h >= tail) + goto lost_event; /* Overrun */ + } + } + + if (h = 0) { + /* Reset and skip the header metadata */ + p = &buf->meta.watch; + p->type = WATCH_TYPE_META; + p->subtype = WATCH_META_SKIP_NOTIFICATION; + p->info = metalen << WATCH_INFO_LENGTH__SHIFT; + head += metalen; + h = metalen; + if (h = tail) + goto lost_event; /* Overrun */ + } + + if (h < tail) { + /* Head is before tail in the buffer. */ + gap = tail - h; + if (len > gap) + goto lost_event; /* Overrun */ + } + + n->info |= wqueue->flag_next; + wqueue->flag_next = 0; + p = &buf->slots[h]; + memcpy(p, n, len * gran); + head += len; + + smp_store_release(&buf->meta.head, head); + if (used = 0) + wake_up(&wqueue->waiters); + return true; + +lost_event: + WRITE_ONCE(buf->meta.watch.info, + buf->meta.watch.info | WATCH_INFO_NOTIFICATIONS_LOST); + return false; +} + +/* + * Post a notification to a watch queue. + */ +static bool post_one_notification(struct watch_queue *wqueue, + struct watch_notification *n) +{ + bool done = false; + + if (!wqueue->buffer) + return false; + + spin_lock_bh(&wqueue->lock); /* Protect head pointer */ + + if (!wqueue->defunct) + done = write_one_notification(wqueue, n); + spin_unlock_bh(&wqueue->lock); + return done; +} + +/* + * Apply filter rules to a notification. + */ +static bool filter_watch_notification(const struct watch_filter *wf, + const struct watch_notification *n) +{ + const struct watch_type_filter *wt; + int i; + + if (!test_bit(n->type, wf->type_filter)) + return false; + + for (i = 0; i < wf->nr_filters; i++) { + wt = &wf->filters[i]; + if (n->type = wt->type && + (wt->subtype_filter[n->subtype >> 5] & + (1U << (n->subtype & 31))) && + (n->info & wt->info_mask) = wt->info_filter) + return true; + } + + return false; /* If there is a filter, the default is to reject. */ +} + +/** + * __post_watch_notification - Post an event notification + * @wlist: The watch list to post the event to. + * @n: The notification record to post. + * @cred: The creds of the process that triggered the notification. + * @id: The ID to match on the watch. + * + * Post a notification of an event into a set of watch queues and let the users + * know. + * + * The size of the notification should be set in n->info & WATCH_INFO_LENGTH and + * should be in units of sizeof(*n). + */ +void __post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + const struct watch_filter *wf; + struct watch_queue *wqueue; + struct watch *watch; + + if (((n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT) = 0) { + WARN_ON(1); + return; + } + + rcu_read_lock(); + + hlist_for_each_entry_rcu(watch, &wlist->watchers, list_node) { + if (watch->id != id) + continue; + n->info &= ~WATCH_INFO_ID; + n->info |= watch->info_id; + + wqueue = rcu_dereference(watch->queue); + wf = rcu_dereference(wqueue->filter); + if (wf && !filter_watch_notification(wf, n)) + continue; + + if (security_post_notification(watch->cred, cred, n) < 0) + continue; + + post_one_notification(wqueue, n); + } + + rcu_read_unlock(); +} +EXPORT_SYMBOL(__post_watch_notification); + +/* + * Allow the queue to be polled. + */ +static __poll_t watch_queue_poll(struct file *file, poll_table *wait) +{ + struct watch_queue *wqueue = file->private_data; + struct watch_queue_buffer *buf = wqueue->buffer; + unsigned int head, tail; + __poll_t mask = 0; + + if (!buf) + return EPOLLERR; + + poll_wait(file, &wqueue->waiters, wait); + + head = READ_ONCE(buf->meta.head); + tail = READ_ONCE(buf->meta.tail); + if (head != tail) + mask |= EPOLLIN | EPOLLRDNORM; + if (head - tail > wqueue->size) + mask |= EPOLLERR; + return mask; +} + +static int watch_queue_set_page_dirty(struct page *page) +{ + SetPageDirty(page); + return 0; +} + +static const struct address_space_operations watch_queue_aops = { + .set_page_dirty = watch_queue_set_page_dirty, +}; + +static vm_fault_t watch_queue_fault(struct vm_fault *vmf) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + page = wqueue->pages[vmf->pgoff]; + get_page(page); + if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) { + put_page(page); + return VM_FAULT_RETRY; + } + vmf->page = page; + return VM_FAULT_LOCKED; +} + +static int watch_queue_account_mem(struct watch_queue *wqueue, + unsigned long nr_pages) +{ + struct user_struct *user = wqueue->owner; + unsigned long page_limit, cur_pages, new_pages; + + /* Don't allow more pages than we can safely lock */ + page_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + cur_pages = atomic_long_read(&user->locked_vm); + + do { + new_pages = cur_pages + nr_pages; + if (new_pages > page_limit && !capable(CAP_IPC_LOCK)) + return -ENOMEM; + } while (atomic_long_try_cmpxchg_relaxed(&user->locked_vm, &cur_pages, + new_pages)); + + wqueue->nr_pages = nr_pages; + return 0; +} + +static void watch_queue_unaccount_mem(struct watch_queue *wqueue) +{ + struct user_struct *user = wqueue->owner; + + if (wqueue->nr_pages) { + atomic_long_sub(wqueue->nr_pages, &user->locked_vm); + wqueue->nr_pages = 0; + } +} + +static void watch_queue_map_pages(struct vm_fault *vmf, + pgoff_t start_pgoff, pgoff_t end_pgoff) +{ + struct watch_queue *wqueue = vmf->vma->vm_file->private_data; + struct page *page; + + rcu_read_lock(); + + do { + page = wqueue->pages[start_pgoff]; + if (trylock_page(page)) { + vm_fault_t ret; + get_page(page); + ret = alloc_set_pte(vmf, NULL, page); + if (ret != 0) + put_page(page); + + unlock_page(page); + } + } while (++start_pgoff < end_pgoff); + + rcu_read_unlock(); +} + +static const struct vm_operations_struct watch_queue_vm_ops = { + .fault = watch_queue_fault, + .map_pages = watch_queue_map_pages, +}; + +/* + * Map the buffer. + */ +static int watch_queue_mmap(struct file *file, struct vm_area_struct *vma) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + u8 nr_pages; + + inode_lock(inode); + nr_pages = wqueue->nr_pages; + inode_unlock(inode); + + if (nr_pages = 0 || + vma->vm_pgoff != 0 || + vma->vm_end - vma->vm_start > nr_pages * PAGE_SIZE || + !(pgprot_val(vma->vm_page_prot) & pgprot_val(PAGE_SHARED))) + return -EINVAL; + + vma->vm_flags |= VM_DONTEXPAND; + vma->vm_ops = &watch_queue_vm_ops; + return 0; +} + +/* + * Allocate the required number of pages. + */ +static long watch_queue_set_size(struct watch_queue *wqueue, unsigned long nr_pages) +{ + struct watch_queue_buffer *buf; + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int metalen = sizeof(buf->meta) / gran; + int i; + + BUILD_BUG_ON(gran != sizeof(__u64)); + + if (wqueue->buffer) + return -EBUSY; + + if (nr_pages = 0 || + nr_pages > 16 || /* TODO: choose a better hard limit */ + !is_power_of_2(nr_pages)) + return -EINVAL; + + if (watch_queue_account_mem(wqueue, nr_pages) < 0) + goto err; + + wqueue->pages = kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!wqueue->pages) + goto err_unaccount; + + for (i = 0; i < nr_pages; i++) { + wqueue->pages[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!wqueue->pages[i]) + goto err_some_pages; + wqueue->pages[i]->mapping = &wqueue->mapping; + SetPageUptodate(wqueue->pages[i]); + } + + buf = vmap(wqueue->pages, nr_pages, VM_MAP, PAGE_SHARED); + if (!buf) + goto err_some_pages; + + wqueue->buffer = buf; + wqueue->size = ((nr_pages * PAGE_SIZE) / sizeof(struct watch_notification)); + + /* The first four slots in the buffer contain metadata about the ring, + * including the head and tail indices and mask. + */ + buf->meta.watch.info = metalen << WATCH_INFO_LENGTH__SHIFT; + buf->meta.watch.type = WATCH_TYPE_META; + buf->meta.watch.subtype = WATCH_META_SKIP_NOTIFICATION; + buf->meta.mask = wqueue->size - 1; + buf->meta.head = metalen; + buf->meta.tail = metalen; + return 0; + +err_some_pages: + for (i--; i >= 0; i--) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + put_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + wqueue->pages = NULL; +err_unaccount: + watch_queue_unaccount_mem(wqueue); +err: + return -ENOMEM; +} + +/* + * Set the filter on a watch queue. + */ +static long watch_queue_set_filter(struct inode *inode, + struct watch_queue *wqueue, + struct watch_notification_filter __user *_filter) +{ + struct watch_notification_type_filter *tf; + struct watch_notification_filter filter; + struct watch_type_filter *q; + struct watch_filter *wfilter; + int ret, nr_filter = 0, i; + + if (!_filter) { + /* Remove the old filter */ + wfilter = NULL; + goto set; + } + + /* Grab the user's filter specification */ + if (copy_from_user(&filter, _filter, sizeof(filter)) != 0) + return -EFAULT; + if (filter.nr_filters = 0 || + filter.nr_filters > 16 || + filter.__reserved != 0) + return -EINVAL; + + tf = memdup_user(_filter->filters, filter.nr_filters * sizeof(*tf)); + if (IS_ERR(tf)) + return PTR_ERR(tf); + + ret = -EINVAL; + for (i = 0; i < filter.nr_filters; i++) { + if ((tf[i].info_filter & ~tf[i].info_mask) || + tf[i].info_mask & WATCH_INFO_LENGTH) + goto err_filter; + /* Ignore any unknown types */ + if (tf[i].type >= sizeof(wfilter->type_filter) * 8) + continue; + nr_filter++; + } + + /* Now we need to build the internal filter from only the relevant + * user-specified filters. + */ + ret = -ENOMEM; + wfilter = kzalloc(struct_size(wfilter, filters, nr_filter), GFP_KERNEL); + if (!wfilter) + goto err_filter; + wfilter->nr_filters = nr_filter; + + q = wfilter->filters; + for (i = 0; i < filter.nr_filters; i++) { + if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) + continue; + + q->type = tf[i].type; + q->info_filter = tf[i].info_filter; + q->info_mask = tf[i].info_mask; + q->subtype_filter[0] = tf[i].subtype_filter[0]; + __set_bit(q->type, wfilter->type_filter); + q++; + } + + kfree(tf); +set: + inode_lock(inode); + rcu_swap_protected(wqueue->filter, wfilter, + lockdep_is_held(&inode->i_rwsem)); + inode_unlock(inode); + if (wfilter) + kfree_rcu(wfilter, rcu); + return 0; + +err_filter: + kfree(tf); + return ret; +} + +/* + * Set parameters. + */ +static long watch_queue_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct watch_queue *wqueue = file->private_data; + struct inode *inode = file_inode(file); + long ret; + + switch (cmd) { + case IOC_WATCH_QUEUE_SET_SIZE: + inode_lock(inode); + ret = watch_queue_set_size(wqueue, arg); + inode_unlock(inode); + return ret; + + case IOC_WATCH_QUEUE_SET_FILTER: + ret = watch_queue_set_filter( + inode, wqueue, + (struct watch_notification_filter __user *)arg); + return ret; + + default: + return -ENOTTY; + } +} + +/* + * Open the file. + */ +static int watch_queue_open(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue; + + wqueue = kzalloc(sizeof(*wqueue), GFP_KERNEL); + if (!wqueue) + return -ENOMEM; + + wqueue->mapping.a_ops = &watch_queue_aops; + wqueue->mapping.i_mmap = RB_ROOT_CACHED; + init_rwsem(&wqueue->mapping.i_mmap_rwsem); + spin_lock_init(&wqueue->mapping.private_lock); + + kref_init(&wqueue->usage); + spin_lock_init(&wqueue->lock); + init_waitqueue_head(&wqueue->waiters); + wqueue->owner = get_uid(file->f_cred->user); + + file->private_data = wqueue; + return 0; +} + +static void __put_watch_queue(struct kref *kref) +{ + struct watch_queue *wqueue + container_of(kref, struct watch_queue, usage); + struct watch_filter *wfilter; + + wfilter = rcu_access_pointer(wqueue->filter); + if (wfilter) + kfree_rcu(wfilter, rcu); + free_uid(wqueue->owner); + kfree_rcu(wqueue, rcu); +} + +/** + * put_watch_queue - Dispose of a ref on a watchqueue. + * @wqueue: The watch queue to unref. + */ +void put_watch_queue(struct watch_queue *wqueue) +{ + kref_put(&wqueue->usage, __put_watch_queue); +} +EXPORT_SYMBOL(put_watch_queue); + +static void free_watch(struct rcu_head *rcu) +{ + struct watch *watch = container_of(rcu, struct watch, rcu); + + put_watch_queue(rcu_access_pointer(watch->queue)); + put_cred(watch->cred); +} + +static void __put_watch(struct kref *kref) +{ + struct watch *watch = container_of(kref, struct watch, usage); + + call_rcu(&watch->rcu, free_watch); +} + +/* + * Discard a watch. + */ +static void put_watch(struct watch *watch) +{ + kref_put(&watch->usage, __put_watch); +} + +/** + * init_watch_queue - Initialise a watch + * @watch: The watch to initialise. + * @wqueue: The queue to assign. + * + * Initialise a watch and set the watch queue. + */ +void init_watch(struct watch *watch, struct watch_queue *wqueue) +{ + kref_init(&watch->usage); + INIT_HLIST_NODE(&watch->list_node); + INIT_HLIST_NODE(&watch->queue_node); + rcu_assign_pointer(watch->queue, wqueue); +} + +/** + * add_watch_to_object - Add a watch on an object to a watch list + * @watch: The watch to add + * @wlist: The watch list to add to + * + * @watch->queue must have been set to point to the queue to post notifications + * to and the watch list of the object to be watched. @watch->cred must also + * have been set to the appropriate credentials and a ref taken on them. + * + * The caller must pin the queue and the list both and must hold the list + * locked against racing watch additions/removals. + */ +int add_watch_to_object(struct watch *watch, struct watch_list *wlist) +{ + struct watch_queue *wqueue = rcu_access_pointer(watch->queue); + struct watch *w; + + hlist_for_each_entry(w, &wlist->watchers, list_node) { + struct watch_queue *wq = rcu_access_pointer(w->queue); + if (wqueue = wq && watch->id = w->id) + return -EBUSY; + } + + rcu_assign_pointer(watch->watch_list, wlist); + + spin_lock_bh(&wqueue->lock); + kref_get(&wqueue->usage); + hlist_add_head(&watch->queue_node, &wqueue->watches); + spin_unlock_bh(&wqueue->lock); + + hlist_add_head(&watch->list_node, &wlist->watchers); + return 0; +} +EXPORT_SYMBOL(add_watch_to_object); + +/** + * remove_watch_from_object - Remove a watch or all watches from an object. + * @wlist: The watch list to remove from + * @wq: The watch queue of interest (ignored if @all is true) + * @id: The ID of the watch to remove (ignored if @all is true) + * @all: True to remove all objects + * + * Remove a specific watch or all watches from an object. A notification is + * sent to the watcher to tell them that this happened. + */ +int remove_watch_from_object(struct watch_list *wlist, struct watch_queue *wq, + u64 id, bool all) +{ + struct watch_notification_removal n; + struct watch_queue *wqueue; + struct watch *watch; + int ret = -EBADSLT; + + rcu_read_lock(); + +again: + spin_lock(&wlist->lock); + hlist_for_each_entry(watch, &wlist->watchers, list_node) { + if (all || + (watch->id = id && rcu_access_pointer(watch->queue) = wq)) + goto found; + } + spin_unlock(&wlist->lock); + goto out; + +found: + ret = 0; + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + spin_unlock(&wlist->lock); + + /* We now own the reference on watch that used to belong to wlist. */ + + n.watch.type = WATCH_TYPE_META; + n.watch.subtype = WATCH_META_REMOVAL_NOTIFICATION; + n.watch.info = watch->info_id | watch_sizeof(n.watch); + n.id = id; + if (id != 0) + n.watch.info = watch->info_id | watch_sizeof(n); + + wqueue = rcu_dereference(watch->queue); + + /* We don't need the watch list lock for the next bit as RCU is + * protecting *wqueue from deallocation. + */ + if (wqueue) { + post_one_notification(wqueue, &n.watch); + + spin_lock_bh(&wqueue->lock); + + if (!hlist_unhashed(&watch->queue_node)) { + hlist_del_init_rcu(&watch->queue_node); + put_watch(watch); + } + + spin_unlock_bh(&wqueue->lock); + } + + if (wlist->release_watch) { + void (*release_watch)(struct watch *); + + release_watch = wlist->release_watch; + rcu_read_unlock(); + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + + if (all && !hlist_empty(&wlist->watchers)) + goto again; +out: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL(remove_watch_from_object); + +/* + * Remove all the watches that are contributory to a queue. This has the + * potential to race with removal of the watches by the destruction of the + * objects being watched or with the distribution of notifications. + */ +static void watch_queue_clear(struct watch_queue *wqueue) +{ + struct watch_list *wlist; + struct watch *watch; + bool release; + + rcu_read_lock(); + spin_lock_bh(&wqueue->lock); + + /* Prevent new additions and prevent notifications from happening */ + wqueue->defunct = true; + + while (!hlist_empty(&wqueue->watches)) { + watch = hlist_entry(wqueue->watches.first, struct watch, queue_node); + hlist_del_init_rcu(&watch->queue_node); + /* We now own a ref on the watch. */ + spin_unlock_bh(&wqueue->lock); + + /* We can't do the next bit under the queue lock as we need to + * get the list lock - which would cause a deadlock if someone + * was removing from the opposite direction at the same time or + * posting a notification. + */ + wlist = rcu_dereference(watch->watch_list); + if (wlist) { + void (*release_watch)(struct watch *); + + spin_lock(&wlist->lock); + + release = !hlist_unhashed(&watch->list_node); + if (release) { + hlist_del_init_rcu(&watch->list_node); + rcu_assign_pointer(watch->watch_list, NULL); + + /* We now own a second ref on the watch. */ + } + + release_watch = wlist->release_watch; + spin_unlock(&wlist->lock); + + if (release) { + if (release_watch) { + rcu_read_unlock(); + /* This might need to call dput(), so + * we have to drop all the locks. + */ + (*release_watch)(watch); + rcu_read_lock(); + } + put_watch(watch); + } + } + + put_watch(watch); + spin_lock_bh(&wqueue->lock); + } + + spin_unlock_bh(&wqueue->lock); + rcu_read_unlock(); +} + +/* + * Release the file. + */ +static int watch_queue_release(struct inode *inode, struct file *file) +{ + struct watch_queue *wqueue = file->private_data; + int i; + + watch_queue_clear(wqueue); + + if (wqueue->buffer) + vunmap(wqueue->buffer); + + for (i = 0; i < wqueue->nr_pages; i++) { + ClearPageUptodate(wqueue->pages[i]); + wqueue->pages[i]->mapping = NULL; + __free_page(wqueue->pages[i]); + } + + kfree(wqueue->pages); + watch_queue_unaccount_mem(wqueue); + put_watch_queue(wqueue); + return 0; +} + +static const struct file_operations watch_queue_fops = { + .owner = THIS_MODULE, + .open = watch_queue_open, + .release = watch_queue_release, + .unlocked_ioctl = watch_queue_ioctl, + .poll = watch_queue_poll, + .mmap = watch_queue_mmap, + .llseek = no_llseek, +}; + +/** + * get_watch_queue - Get a watch queue from its file descriptor. + * @fd: The fd to query. + */ +struct watch_queue *get_watch_queue(int fd) +{ + struct watch_queue *wqueue = ERR_PTR(-EBADF); + struct fd f; + + f = fdget(fd); + if (f.file) { + wqueue = ERR_PTR(-EINVAL); + if (f.file->f_op = &watch_queue_fops) { + wqueue = f.file->private_data; + kref_get(&wqueue->usage); + } + fdput(f); + } + + return wqueue; +} +EXPORT_SYMBOL(get_watch_queue); + +static struct miscdevice watch_queue_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "watch_queue", + .fops = &watch_queue_fops, + .mode = 0666, +}; +builtin_misc_device(watch_queue_dev); diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h new file mode 100644 index 000000000000..34d7915cc5b3 --- /dev/null +++ b/include/linux/watch_queue.h @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* User-mappable watch queue + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See Documentation/watch_queue.rst + */ + +#ifndef _LINUX_WATCH_QUEUE_H +#define _LINUX_WATCH_QUEUE_H + +#include <uapi/linux/watch_queue.h> +#include <linux/kref.h> +#include <linux/rcupdate.h> + +#ifdef CONFIG_WATCH_QUEUE + +struct watch_queue; +struct cred; + +/* + * Representation of a watch on an object. + */ +struct watch { + union { + struct rcu_head rcu; + u32 info_id; /* ID to be OR'd in to info field */ + }; + struct watch_queue __rcu *queue; /* Queue to post events to */ + struct hlist_node queue_node; /* Link in queue->watches */ + struct watch_list __rcu *watch_list; + struct hlist_node list_node; /* Link in watch_list->watchers */ + const struct cred *cred; /* Creds of the owner of the watch */ + void *private; /* Private data for the watched object */ + u64 id; /* Internal identifier */ + struct kref usage; /* Object usage count */ +}; + +/* + * List of watches on an object. + */ +struct watch_list { + struct rcu_head rcu; + struct hlist_head watchers; + void (*release_watch)(struct watch *); + spinlock_t lock; +}; + +extern void __post_watch_notification(struct watch_list *, + struct watch_notification *, + const struct cred *, + u64); +extern struct watch_queue *get_watch_queue(int); +extern void put_watch_queue(struct watch_queue *); +extern void init_watch(struct watch *, struct watch_queue *); +extern int add_watch_to_object(struct watch *, struct watch_list *); +extern int remove_watch_from_object(struct watch_list *, struct watch_queue *, u64, bool); + +static inline void init_watch_list(struct watch_list *wlist, + void (*release_watch)(struct watch *)) +{ + INIT_HLIST_HEAD(&wlist->watchers); + spin_lock_init(&wlist->lock); + wlist->release_watch = release_watch; +} + +static inline void post_watch_notification(struct watch_list *wlist, + struct watch_notification *n, + const struct cred *cred, + u64 id) +{ + if (unlikely(wlist)) + __post_watch_notification(wlist, n, cred, id); +} + +static inline void remove_watch_list(struct watch_list *wlist, u64 id) +{ + if (wlist) { + remove_watch_from_object(wlist, NULL, id, true); + kfree_rcu(wlist, rcu); + } +} + +/** + * watch_sizeof - Calculate the information part of the size of a watch record, + * given the structure size. + */ +#define watch_sizeof(STRUCT) \ + ((sizeof(STRUCT) / WATCH_LENGTH_GRANULARITY) << WATCH_INFO_LENGTH__SHIFT) + +#endif + +#endif /* _LINUX_WATCH_QUEUE_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 70f575099968..3f0e09ed6963 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -3,6 +3,10 @@ #define _UAPI_LINUX_WATCH_QUEUE_H #include <linux/types.h> +#include <linux/ioctl.h> + +#define IOC_WATCH_QUEUE_SET_SIZE _IO('W', 0x60) /* Set the size in pages */ +#define IOC_WATCH_QUEUE_SET_FILTER _IO('W', 0x61) /* Set the filter */ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ @@ -64,4 +68,34 @@ struct watch_queue_buffer { */ #define WATCH_INFO_NOTIFICATIONS_LOST WATCH_INFO_FLAG_0 +/* + * Notification filtering rules (IOC_WATCH_QUEUE_SET_FILTER). + */ +struct watch_notification_type_filter { + __u32 type; /* Type to apply filter to */ + __u32 info_filter; /* Filter on watch_notification::info */ + __u32 info_mask; /* Mask of relevant bits in info_filter */ + __u32 subtype_filter[8]; /* Bitmask of subtypes to filter on */ +}; + +struct watch_notification_filter { + __u32 nr_filters; /* Number of filters */ + __u32 __reserved; /* Must be 0 */ + struct watch_notification_type_filter filters[]; +}; + +/* + * Extended watch removal notification. This is used optionally if the type + * wants to indicate an identifier for the object being watched, if there is + * such. This can be distinguished by the length. + * + * type -> WATCH_TYPE_META + * subtype -> WATCH_META_REMOVAL_NOTIFICATION + * length -> 2 * gran + */ +struct watch_notification_removal { + struct watch_notification watch; + __u64 id; /* Type-dependent identifier */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:30 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 +++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++ security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 103 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 275 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services =============== diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state == KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at == 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at == 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..684d60228ac0 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,94 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + watch->cred = get_current_cred(); + + ret = security_watch_key(watch, key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret == 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + if (watch) { + put_cred(watch->cred); + kfree(watch); + } +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1951,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret == 0) ret = __key_link_check_live_key(keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 +++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++ security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 103 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 275 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services =============== diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state == KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret == 0) + if (ret == 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at == 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at == 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..684d60228ac0 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,94 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + watch->cred = get_current_cred(); + + ret = security_watch_key(watch, key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret == 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + if (watch) { + put_cred(watch->cred); + kfree(watch); + } +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1951,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret == 0) ret = __key_link_check_live_key(keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret == 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 05/11] keys: Add a notification facility [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a key/keyring change notification facility whereby notifications about changes in key and keyring content and attributes can be received. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_KEY_NOTIFY, .subtype_filter[0] = UINT_MAX, }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01); After that, records will be placed into the queue when events occur in which keys are changed in some way. Records are of the following format: struct key_notification { struct watch_notification watch; __u32 key_id; __u32 aux; } *n; Where: n->watch.type will be WATCH_TYPE_KEY_NOTIFY. n->watch.subtype will indicate the type of event, such as NOTIFY_KEY_REVOKED. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to keyctl_watch_key(), shifted. n->key will be the ID of the affected key. n->aux will hold subtype-dependent information, such as the key being linked into the keyring specified by n->key in the case of NOTIFY_KEY_LINKED. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Note also that the queue can be shared between multiple notifications of various types. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/security/keys/core.rst | 58 +++++++++++++++++++ include/linux/key.h | 3 + include/uapi/linux/keyctl.h | 2 + include/uapi/linux/watch_queue.h | 28 +++++++++ security/keys/Kconfig | 9 +++ security/keys/compat.c | 3 + security/keys/gc.c | 5 ++ security/keys/internal.h | 30 ++++++++++ security/keys/key.c | 38 ++++++++----- security/keys/keyctl.c | 103 +++++++++++++++++++++++++++++++++- security/keys/keyring.c | 20 ++++--- security/keys/request_key.c | 4 + 12 files changed, 275 insertions(+), 28 deletions(-) diff --git a/Documentation/security/keys/core.rst b/Documentation/security/keys/core.rst index d6d8b0b756b6..957179f8cea9 100644 --- a/Documentation/security/keys/core.rst +++ b/Documentation/security/keys/core.rst @@ -833,6 +833,7 @@ The keyctl syscall functions are: A process must have search permission on the key for this function to be successful. + * Compute a Diffie-Hellman shared secret or public key:: long keyctl(KEYCTL_DH_COMPUTE, struct keyctl_dh_params *params, @@ -1026,6 +1027,63 @@ The keyctl syscall functions are: written into the output buffer. Verification returns 0 on success. + * Watch a key or keyring for changes:: + + long keyctl(KEYCTL_WATCH_KEY, key_serial_t key, int queue_fd, + const struct watch_notification_filter *filter); + + This will set or remove a watch for changes on the specified key or + keyring. + + "key" is the ID of the key to be watched. + + "queue_fd" is a file descriptor referring to an open "/dev/watch_queue" + which manages the buffer into which notifications will be delivered. + + "filter" is either NULL to remove a watch or a filter specification to + indicate what events are required from the key. + + See Documentation/watch_queue.rst for more information. + + Note that only one watch may be emplaced for any particular { key, + queue_fd } combination. + + Notification records look like:: + + struct key_notification { + struct watch_notification watch; + __u32 key_id; + __u32 aux; + }; + + In this, watch::type will be "WATCH_TYPE_KEY_NOTIFY" and subtype will be + one of:: + + NOTIFY_KEY_INSTANTIATED + NOTIFY_KEY_UPDATED + NOTIFY_KEY_LINKED + NOTIFY_KEY_UNLINKED + NOTIFY_KEY_CLEARED + NOTIFY_KEY_REVOKED + NOTIFY_KEY_INVALIDATED + NOTIFY_KEY_SETATTR + + Where these indicate a key being instantiated/rejected, updated, a link + being made in a keyring, a link being removed from a keyring, a keyring + being cleared, a key being revoked, a key being invalidated or a key + having one of its attributes changed (user, group, perm, timeout, + restriction). + + If a watched key is deleted, a basic watch_notification will be issued + with "type" set to WATCH_TYPE_META and "subtype" set to + watch_meta_removal_notification. The watchpoint ID will be set in the + "info" field. + + This needs to be configured by enabling: + + "Provide key/keyring change notifications" (KEY_NOTIFICATIONS) + + Kernel Services ======= diff --git a/include/linux/key.h b/include/linux/key.h index 50028338a4cc..b897ef4f7030 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -176,6 +176,9 @@ struct key { struct list_head graveyard_link; struct rb_node serial_node; }; +#ifdef CONFIG_KEY_NOTIFICATIONS + struct watch_list *watchers; /* Entities watching this key for changes */ +#endif struct rw_semaphore sem; /* change vs change sem */ struct key_user *user; /* owner of this key */ void *security; /* security data for this key */ diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h index ed3d5893830d..4c8884eea808 100644 --- a/include/uapi/linux/keyctl.h +++ b/include/uapi/linux/keyctl.h @@ -69,6 +69,7 @@ #define KEYCTL_RESTRICT_KEYRING 29 /* Restrict keys allowed to link to a keyring */ #define KEYCTL_MOVE 30 /* Move keys between keyrings */ #define KEYCTL_CAPABILITIES 31 /* Find capabilities of keyrings subsystem */ +#define KEYCTL_WATCH_KEY 32 /* Watch a key or ring of keys for changes */ /* keyctl structures */ struct keyctl_dh_params { @@ -130,5 +131,6 @@ struct keyctl_pkey_params { #define KEYCTL_CAPS0_MOVE 0x80 /* KEYCTL_MOVE supported */ #define KEYCTL_CAPS1_NS_KEYRING_NAME 0x01 /* Keyring names are per-user_namespace */ #define KEYCTL_CAPS1_NS_KEY_TAG 0x02 /* Key indexing can include a namespace tag */ +#define KEYCTL_CAPS1_NOTIFICATIONS 0x04 /* Keys generate watchable notifications */ #endif /* _LINUX_KEYCTL_H */ diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 3f0e09ed6963..654d4ba8b909 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -10,7 +10,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ - WATCH_TYPE___NR = 1 + WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ + WATCH_TYPE___NR = 2 }; enum watch_meta_notification_subtype { @@ -98,4 +99,29 @@ struct watch_notification_removal { __u64 id; /* Type-dependent identifier */ }; +/* + * Type of key/keyring change notification. + */ +enum key_notification_subtype { + NOTIFY_KEY_INSTANTIATED = 0, /* Key was instantiated (aux is error code) */ + NOTIFY_KEY_UPDATED = 1, /* Key was updated */ + NOTIFY_KEY_LINKED = 2, /* Key (aux) was added to watched keyring */ + NOTIFY_KEY_UNLINKED = 3, /* Key (aux) was removed from watched keyring */ + NOTIFY_KEY_CLEARED = 4, /* Keyring was cleared */ + NOTIFY_KEY_REVOKED = 5, /* Key was revoked */ + NOTIFY_KEY_INVALIDATED = 6, /* Key was invalidated */ + NOTIFY_KEY_SETATTR = 7, /* Key's attributes got changed */ +}; + +/* + * Key/keyring notification record. + * - watch.type = WATCH_TYPE_KEY_NOTIFY + * - watch.subtype = enum key_notification_type + */ +struct key_notification { + struct watch_notification watch; + __u32 key_id; /* The key/keyring affected */ + __u32 aux; /* Per-type auxiliary data */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ diff --git a/security/keys/Kconfig b/security/keys/Kconfig index dd313438fecf..20791a556b58 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -120,3 +120,12 @@ config KEY_DH_OPERATIONS in the kernel. If you are unsure as to whether this is required, answer N. + +config KEY_NOTIFICATIONS + bool "Provide key/keyring change notifications" + depends on KEYS && WATCH_QUEUE + help + This option provides support for getting change notifications on keys + and keyrings on which the caller has View permission. This makes use + of the /dev/watch_queue misc device to handle the notification + buffer and provides KEYCTL_WATCH_KEY to enable/disable watches. diff --git a/security/keys/compat.c b/security/keys/compat.c index 9bcc404131aa..ac5a4fd0d7ea 100644 --- a/security/keys/compat.c +++ b/security/keys/compat.c @@ -161,6 +161,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option, case KEYCTL_CAPABILITIES: return keyctl_capabilities(compat_ptr(arg2), arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key(arg2, arg3, arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/gc.c b/security/keys/gc.c index 671dd730ecfc..3c90807476eb 100644 --- a/security/keys/gc.c +++ b/security/keys/gc.c @@ -131,6 +131,11 @@ static noinline void key_gc_unused_keys(struct list_head *keys) kdebug("- %u", key->serial); key_check(key); +#ifdef CONFIG_KEY_NOTIFICATIONS + remove_watch_list(key->watchers, key->serial); + key->watchers = NULL; +#endif + /* Throw away the key data if the key is instantiated */ if (state = KEY_IS_POSITIVE && key->type->destroy) key->type->destroy(key); diff --git a/security/keys/internal.h b/security/keys/internal.h index c039373488bd..240f55c7b4a2 100644 --- a/security/keys/internal.h +++ b/security/keys/internal.h @@ -15,6 +15,7 @@ #include <linux/task_work.h> #include <linux/keyctl.h> #include <linux/refcount.h> +#include <linux/watch_queue.h> #include <linux/compat.h> struct iovec; @@ -97,7 +98,8 @@ extern int __key_link_begin(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit **_edit); extern int __key_link_check_live_key(struct key *keyring, struct key *key); -extern void __key_link(struct key *key, struct assoc_array_edit **_edit); +extern void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit); extern void __key_link_end(struct key *keyring, const struct keyring_index_key *index_key, struct assoc_array_edit *edit); @@ -181,6 +183,23 @@ extern int key_task_permission(const key_ref_t key_ref, const struct cred *cred, key_perm_t perm); +static inline void notify_key(struct key *key, + enum key_notification_subtype subtype, u32 aux) +{ +#ifdef CONFIG_KEY_NOTIFICATIONS + struct key_notification n = { + .watch.type = WATCH_TYPE_KEY_NOTIFY, + .watch.subtype = subtype, + .watch.info = watch_sizeof(n), + .key_id = key_serial(key), + .aux = aux, + }; + + post_watch_notification(key->watchers, &n.watch, current_cred(), + n.key_id); +#endif +} + /* * Check to see whether permission is granted to use a key in the desired way. */ @@ -331,6 +350,15 @@ static inline long keyctl_pkey_e_d_s(int op, extern long keyctl_capabilities(unsigned char __user *_buffer, size_t buflen); +#ifdef CONFIG_KEY_NOTIFICATIONS +extern long keyctl_watch_key(key_serial_t, int, int); +#else +static inline long keyctl_watch_key(key_serial_t key_id, int watch_fd, int watch_id) +{ + return -EOPNOTSUPP; +} +#endif + /* * Debugging key validation */ diff --git a/security/keys/key.c b/security/keys/key.c index 764f4c57913e..83e8d7c4bb6f 100644 --- a/security/keys/key.c +++ b/security/keys/key.c @@ -443,6 +443,7 @@ static int __key_instantiate_and_link(struct key *key, /* mark the key as being instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_INSTANTIATED, 0); if (test_and_clear_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags)) awaken = 1; @@ -452,7 +453,7 @@ static int __key_instantiate_and_link(struct key *key, if (test_bit(KEY_FLAG_KEEP, &keyring->flags)) set_bit(KEY_FLAG_KEEP, &key->flags); - __key_link(key, _edit); + __key_link(keyring, key, _edit); } /* disable the authorisation key */ @@ -600,6 +601,7 @@ int key_reject_and_link(struct key *key, /* mark the key as being negatively instantiated */ atomic_inc(&key->user->nikeys); mark_key_instantiated(key, -error); + notify_key(key, NOTIFY_KEY_INSTANTIATED, -error); key->expiry = ktime_get_real_seconds() + timeout; key_schedule_gc(key->expiry + key_gc_delay); @@ -610,7 +612,7 @@ int key_reject_and_link(struct key *key, /* and link it into the destination keyring */ if (keyring && link_ret = 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); /* disable the authorisation key */ if (authkey) @@ -763,9 +765,11 @@ static inline key_ref_t __key_update(key_ref_t key_ref, down_write(&key->sem); ret = key->type->update(key, prep); - if (ret = 0) + if (ret = 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1013,9 +1017,11 @@ int key_update(key_ref_t key_ref, const void *payload, size_t plen) down_write(&key->sem); ret = key->type->update(key, &prep); - if (ret = 0) + if (ret = 0) { /* Updating a negative key positively instantiates it */ mark_key_instantiated(key, 0); + notify_key(key, NOTIFY_KEY_UPDATED, 0); + } up_write(&key->sem); @@ -1047,15 +1053,17 @@ void key_revoke(struct key *key) * instantiated */ down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) && - key->type->revoke) - key->type->revoke(key); - - /* set the death time to no more than the expiry time */ - time = ktime_get_real_seconds(); - if (key->revoked_at = 0 || key->revoked_at > time) { - key->revoked_at = time; - key_schedule_gc(key->revoked_at + key_gc_delay); + if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags)) { + notify_key(key, NOTIFY_KEY_REVOKED, 0); + if (key->type->revoke) + key->type->revoke(key); + + /* set the death time to no more than the expiry time */ + time = ktime_get_real_seconds(); + if (key->revoked_at = 0 || key->revoked_at > time) { + key->revoked_at = time; + key_schedule_gc(key->revoked_at + key_gc_delay); + } } up_write(&key->sem); @@ -1077,8 +1085,10 @@ void key_invalidate(struct key *key) if (!test_bit(KEY_FLAG_INVALIDATED, &key->flags)) { down_write_nested(&key->sem, 1); - if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) + if (!test_and_set_bit(KEY_FLAG_INVALIDATED, &key->flags)) { + notify_key(key, NOTIFY_KEY_INVALIDATED, 0); key_schedule_gc_links(); + } up_write(&key->sem); } } diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 9b898c969558..684d60228ac0 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -37,7 +37,9 @@ static const unsigned char keyrings_capabilities[2] = { KEYCTL_CAPS0_MOVE ), [1] = (KEYCTL_CAPS1_NS_KEYRING_NAME | - KEYCTL_CAPS1_NS_KEY_TAG), + KEYCTL_CAPS1_NS_KEY_TAG | + (IS_ENABLED(CONFIG_KEY_NOTIFICATIONS) ? KEYCTL_CAPS1_NOTIFICATIONS : 0) + ), }; static int key_get_type_from_user(char *type, @@ -970,6 +972,7 @@ long keyctl_chown_key(key_serial_t id, uid_t user, gid_t group) if (group != (gid_t) -1) key->gid = gid; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; error_put: @@ -1020,6 +1023,7 @@ long keyctl_setperm_key(key_serial_t id, key_perm_t perm) /* if we're not the sysadmin, we can only change a key that we own */ if (capable(CAP_SYS_ADMIN) || uid_eq(key->uid, current_fsuid())) { key->perm = perm; + notify_key(key, NOTIFY_KEY_SETATTR, 0); ret = 0; } @@ -1411,10 +1415,12 @@ long keyctl_set_timeout(key_serial_t id, unsigned timeout) okay: key = key_ref_to_ptr(key_ref); ret = 0; - if (test_bit(KEY_FLAG_KEEP, &key->flags)) + if (test_bit(KEY_FLAG_KEEP, &key->flags)) { ret = -EPERM; - else + } else { key_set_timeout(key, timeout); + notify_key(key, NOTIFY_KEY_SETATTR, 0); + } key_put(key); error: @@ -1688,6 +1694,94 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, return ret; } +#ifdef CONFIG_KEY_NOTIFICATIONS +/* + * Watch for changes to a key. + * + * The caller must have View permission to watch a key or keyring. + */ +long keyctl_watch_key(key_serial_t id, int watch_queue_fd, int watch_id) +{ + struct watch_queue *wqueue; + struct watch_list *wlist = NULL; + struct watch *watch = NULL; + struct key *key; + key_ref_t key_ref; + long ret; + + if (watch_id < -1 || watch_id > 0xff) + return -EINVAL; + + key_ref = lookup_user_key(id, KEY_LOOKUP_CREATE, KEY_NEED_VIEW); + if (IS_ERR(key_ref)) + return PTR_ERR(key_ref); + key = key_ref_to_ptr(key_ref); + + wqueue = get_watch_queue(watch_queue_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err_key; + } + + if (watch_id >= 0) { + ret = -ENOMEM; + if (!key->watchers) { + wlist = kzalloc(sizeof(*wlist), GFP_KERNEL); + if (!wlist) + goto err_wqueue; + init_watch_list(wlist, NULL); + } + + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wlist; + + init_watch(watch, wqueue); + watch->id = key->serial; + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + watch->cred = get_current_cred(); + + ret = security_watch_key(watch, key); + if (ret < 0) + goto err_watch; + + down_write(&key->sem); + if (!key->watchers) { + key->watchers = wlist; + wlist = NULL; + } + + ret = add_watch_to_object(watch, key->watchers); + up_write(&key->sem); + + if (ret = 0) + watch = NULL; + } else { + ret = -EBADSLT; + if (key->watchers) { + down_write(&key->sem); + ret = remove_watch_from_object(key->watchers, + wqueue, key_serial(key), + false); + up_write(&key->sem); + } + } + +err_watch: + if (watch) { + put_cred(watch->cred); + kfree(watch); + } +err_wlist: + kfree(wlist); +err_wqueue: + put_watch_queue(wqueue); +err_key: + key_put(key); + return ret; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ + /* * Get keyrings subsystem capabilities. */ @@ -1857,6 +1951,9 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3, case KEYCTL_CAPABILITIES: return keyctl_capabilities((unsigned char __user *)arg2, (size_t)arg3); + case KEYCTL_WATCH_KEY: + return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4); + default: return -EOPNOTSUPP; } diff --git a/security/keys/keyring.c b/security/keys/keyring.c index febf36c6ddc5..40a0dcdfda44 100644 --- a/security/keys/keyring.c +++ b/security/keys/keyring.c @@ -1060,12 +1060,14 @@ int keyring_restrict(key_ref_t keyring_ref, const char *type, down_write(&keyring->sem); down_write(&keyring_serialise_restrict_sem); - if (keyring->restrict_link) + if (keyring->restrict_link) { ret = -EEXIST; - else if (keyring_detect_restriction_cycle(keyring, restrict_link)) + } else if (keyring_detect_restriction_cycle(keyring, restrict_link)) { ret = -EDEADLK; - else + } else { keyring->restrict_link = restrict_link; + notify_key(keyring, NOTIFY_KEY_SETATTR, 0); + } up_write(&keyring_serialise_restrict_sem); up_write(&keyring->sem); @@ -1366,12 +1368,14 @@ int __key_link_check_live_key(struct key *keyring, struct key *key) * holds at most one link to any given key of a particular type+description * combination. */ -void __key_link(struct key *key, struct assoc_array_edit **_edit) +void __key_link(struct key *keyring, struct key *key, + struct assoc_array_edit **_edit) { __key_get(key); assoc_array_insert_set_object(*_edit, keyring_key_to_ptr(key)); assoc_array_apply_edit(*_edit); *_edit = NULL; + notify_key(keyring, NOTIFY_KEY_LINKED, key_serial(key)); } /* @@ -1455,7 +1459,7 @@ int key_link(struct key *keyring, struct key *key) if (ret = 0) ret = __key_link_check_live_key(keyring, key); if (ret = 0) - __key_link(key, &edit); + __key_link(keyring, key, &edit); error_end: __key_link_end(keyring, &key->index_key, edit); @@ -1487,7 +1491,7 @@ static int __key_unlink_begin(struct key *keyring, struct key *key, struct assoc_array_edit *edit; BUG_ON(*_edit != NULL); - + edit = assoc_array_delete(&keyring->keys, &keyring_assoc_array_ops, &key->index_key); if (IS_ERR(edit)) @@ -1507,6 +1511,7 @@ static void __key_unlink(struct key *keyring, struct key *key, struct assoc_array_edit **_edit) { assoc_array_apply_edit(*_edit); + notify_key(keyring, NOTIFY_KEY_UNLINKED, key_serial(key)); *_edit = NULL; key_payload_reserve(keyring, keyring->datalen - KEYQUOTA_LINK_BYTES); } @@ -1625,7 +1630,7 @@ int key_move(struct key *key, goto error; __key_unlink(from_keyring, key, &from_edit); - __key_link(key, &to_edit); + __key_link(to_keyring, key, &to_edit); error: __key_link_end(to_keyring, &key->index_key, to_edit); __key_unlink_end(from_keyring, key, from_edit); @@ -1659,6 +1664,7 @@ int keyring_clear(struct key *keyring) } else { if (edit) assoc_array_apply_edit(edit); + notify_key(keyring, NOTIFY_KEY_CLEARED, 0); key_payload_reserve(keyring, 0); ret = 0; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 7325f382dbf4..430f24a461f5 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -418,7 +418,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, goto key_already_present; if (dest_keyring) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); mutex_unlock(&key_construction_mutex); if (dest_keyring) @@ -437,7 +437,7 @@ static int construct_alloc_key(struct keyring_search_context *ctx, if (dest_keyring) { ret = __key_link_check_live_key(dest_keyring, key); if (ret = 0) - __key_link(key, &edit); + __key_link(dest_keyring, key, &edit); __key_link_end(dest_keyring, &ctx->index_key, edit); if (ret < 0) goto link_check_failed; ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:30 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 94 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 24 files changed, 153 insertions(+), 2 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +======================== + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ============= @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..879f82225979 --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + watch->cred = get_current_cred(); + + ret = security_watch_devices(watch); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret == 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + if (watch) { + put_cred(watch->cred); + kfree(watch); + } +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 94 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 24 files changed, 153 insertions(+), 2 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +======================== + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ============= @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..879f82225979 --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + watch->cred = get_current_cred(); + + ret = security_watch_devices(watch); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret == 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + if (watch) { + put_cred(watch->cred); + kfree(watch); + } +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 06/11] Add a general, global device notification watch list [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Create a general, global watch list that can be used for the posting of device notification events, for such things as device attachment, detachment and errors on sources such as block devices and USB devices. This can be enabled with: CONFIG_DEVICE_NOTIFICATIONS To add a watch on this list, an event queue must be created and configured: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); and then a watch can be placed upon it using a system call: watch_devices(fd, 12, 0); Unless the application wants to receive all events, it should employ appropriate filters. For example, to receive just USB notifications, it could do: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 22 ++++++ arch/alpha/kernel/syscalls/syscall.tbl | 1 arch/arm/tools/syscall.tbl | 1 arch/ia64/kernel/syscalls/syscall.tbl | 1 arch/m68k/kernel/syscalls/syscall.tbl | 1 arch/microblaze/kernel/syscalls/syscall.tbl | 1 arch/mips/kernel/syscalls/syscall_n32.tbl | 1 arch/mips/kernel/syscalls/syscall_n64.tbl | 1 arch/mips/kernel/syscalls/syscall_o32.tbl | 1 arch/parisc/kernel/syscalls/syscall.tbl | 1 arch/powerpc/kernel/syscalls/syscall.tbl | 1 arch/s390/kernel/syscalls/syscall.tbl | 1 arch/sh/kernel/syscalls/syscall.tbl | 1 arch/sparc/kernel/syscalls/syscall.tbl | 1 arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 arch/xtensa/kernel/syscalls/syscall.tbl | 1 drivers/base/Kconfig | 9 +++ drivers/base/Makefile | 1 drivers/base/watch.c | 94 +++++++++++++++++++++++++++ include/linux/device.h | 7 ++ include/linux/syscalls.h | 1 include/uapi/asm-generic/unistd.h | 4 + kernel/sys_ni.c | 1 24 files changed, 153 insertions(+), 2 deletions(-) create mode 100644 drivers/base/watch.c diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 6fb3aa3356d3..393905b904c8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -276,6 +276,25 @@ The ``id`` is the ID of the source object (such as the serial number on a key). Only watches that have the same ID set in them will see this notification. +Global Device Watch List +============ + +There is a global watch list that hardware generated events, such as device +connection, disconnection, failure and error can be posted upon. It must be +enabled using:: + + CONFIG_DEVICE_NOTIFICATIONS + +Watchpoints are set in userspace using the device_notify(2) system call. +Within the kernel events are posted upon it using:: + + void post_device_notification(struct watch_notification *n, u64 id); + +where ``n`` is the formatted notification record to post. ``id`` is an +identifier that can be used to direct to specific watches, but it should be 0 +for general use on this queue. + + Watch Sources ====== @@ -291,7 +310,8 @@ Any particular buffer can be fed from multiple sources. Sources include: * WATCH_TYPE_BLOCK_NOTIFY Notifications of this type indicate block layer events, such as I/O errors - or temporary link loss. Watches of this type are set on a global queue. + or temporary link loss. Watches of this type are set on the global device + watch list. Event Filtering diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 728fe028c02c..8e841d8e4c22 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -475,3 +475,4 @@ 543 common fspick sys_fspick 544 common pidfd_open sys_pidfd_open # 545 reserved for clone3 +546 common watch_devices sys_watch_devices diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 6da7dc4d79cc..0f080cf44cc9 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -449,3 +449,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 36d5faf4c86c..2f33f5db2fed 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -356,3 +356,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index a88a285a0e5f..83e4e8784b88 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -435,3 +435,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 09b0cd7dab0a..9a70a3be3b7b 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index c9c879ec9b6d..2ba5b649f0ab 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -374,3 +374,4 @@ 433 n32 fspick sys_fspick 434 n32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n32 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index bbce9159caa1..ff350988584d 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -350,3 +350,4 @@ 433 n64 fspick sys_fspick 434 n64 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 n64 watch_devices sys_watch_devices diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 9653591428ec..7b26bd39900e 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -423,3 +423,4 @@ 433 o32 fspick sys_fspick 434 o32 pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 o32 watch_devices sys_watch_devices diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 670d1371aca1..d846365a4f7c 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -432,3 +432,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3_wrapper +436 common watch_devices sys_watch_devices diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 43f736ed47f2..0a503239ab5c 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -517,3 +517,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 nospu clone3 ppc_clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 3054e9c035a3..19b43c0d928a 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick sys_fspick 434 common pidfd_open sys_pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 sys_clone3 +436 common watch_devices sys_watch_devices sys_watch_devices diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index b5ed26c4c005..b454e07c9372 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8c8cc7537fb2..8ef43c27457e 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -481,3 +481,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open # 435 reserved for clone3 +436 common watch_devices sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c00019abd076..0e34ddeb97a1 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -440,3 +440,4 @@ 433 i386 fspick sys_fspick __ia32_sys_fspick 434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open 435 i386 clone3 sys_clone3 __ia32_sys_clone3 +436 i386 watch_devices sys_watch_devices __ia32_sys_watch_devices diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index c29976eca4a8..29293d103829 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -357,6 +357,7 @@ 433 common fspick __x64_sys_fspick 434 common pidfd_open __x64_sys_pidfd_open 435 common clone3 __x64_sys_clone3/ptregs +436 common watch_devices __x64_sys_watch_devices # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 25f4de729a6d..243fa18b8d1e 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -406,3 +406,4 @@ 433 common fspick sys_fspick 434 common pidfd_open sys_pidfd_open 435 common clone3 sys_clone3 +436 common watch_devices sys_watch_devices diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index dc404492381d..7f899cae41a0 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -1,6 +1,15 @@ # SPDX-License-Identifier: GPL-2.0 menu "Generic Driver Options" +config DEVICE_NOTIFICATIONS + bool "Provide device event notifications" + depends on WATCH_QUEUE + help + This option provides support for getting hardware event notifications + on devices, buses and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. + config UEVENT_HELPER bool "Support for uevent helper" help diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..4db2e8f1a1f4 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -7,6 +7,7 @@ obj-y := component.o core.o bus.o dd.o syscore.o \ attribute_container.o transport_class.o \ topology.o container.o property.o cacheinfo.o \ devcon.o swnode.o +obj-$(CONFIG_DEVICE_NOTIFICATIONS) += watch.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o obj-y += power/ obj-$(CONFIG_ISA_BUS_API) += isa.o diff --git a/drivers/base/watch.c b/drivers/base/watch.c new file mode 100644 index 000000000000..879f82225979 --- /dev/null +++ b/drivers/base/watch.c @@ -0,0 +1,94 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Event notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/device.h> +#include <linux/watch_queue.h> +#include <linux/syscalls.h> +#include <linux/init_task.h> +#include <linux/security.h> + +/* + * Global queue for watching for device layer events. + */ +static struct watch_list device_watchers = { + .watchers = HLIST_HEAD_INIT, + .lock = __SPIN_LOCK_UNLOCKED(&device_watchers.lock), +}; + +static DEFINE_SPINLOCK(device_watchers_lock); + +/** + * post_device_notification - Post notification of a device event + * @n - The notification to post + * @id - The device ID + * + * Note that there's only a global queue to which all events are posted. Might + * want to provide per-dev queues also. + */ +void post_device_notification(struct watch_notification *n, u64 id) +{ + post_watch_notification(&device_watchers, n, &init_cred, id); +} +EXPORT_SYMBOL(post_device_notification); + +/** + * sys_watch_devices - Watch for device events. + * @watch_fd: The watch queue to send notifications to. + * @watch_id: The watch ID to be placed in the notification (-1 to remove watch) + * @flags: Flags (reserved for future) + */ +SYSCALL_DEFINE3(watch_devices, int, watch_fd, int, watch_id, unsigned int, flags) +{ + struct watch_queue *wqueue; + struct watch *watch = NULL; + long ret = -ENOMEM; + + if (watch_id < -1 || watch_id > 0xff || flags) + return -EINVAL; + + wqueue = get_watch_queue(watch_fd); + if (IS_ERR(wqueue)) { + ret = PTR_ERR(wqueue); + goto err; + } + + if (watch_id >= 0) { + watch = kzalloc(sizeof(*watch), GFP_KERNEL); + if (!watch) + goto err_wqueue; + + init_watch(watch, wqueue); + watch->info_id = (u32)watch_id << WATCH_INFO_ID__SHIFT; + watch->cred = get_current_cred(); + + ret = security_watch_devices(watch); + if (ret < 0) + goto err_watch; + + spin_lock(&device_watchers_lock); + ret = add_watch_to_object(watch, &device_watchers); + spin_unlock(&device_watchers_lock); + if (ret = 0) + watch = NULL; + } else { + spin_lock(&device_watchers_lock); + ret = remove_watch_from_object(&device_watchers, wqueue, 0, + false); + spin_unlock(&device_watchers_lock); + } + +err_watch: + if (watch) { + put_cred(watch->cred); + kfree(watch); + } +err_wqueue: + put_watch_queue(wqueue); +err: + return ret; +} diff --git a/include/linux/device.h b/include/linux/device.h index 6717adee33f0..9def6a53b598 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -43,6 +43,7 @@ struct iommu_group; struct iommu_fwspec; struct dev_pin_info; struct iommu_param; +struct watch_notification; struct bus_attribute { struct attribute attr; @@ -1412,6 +1413,12 @@ struct device_link *device_link_add(struct device *consumer, void device_link_del(struct device_link *link); void device_link_remove(void *consumer, struct device *supplier); +#ifdef CONFIG_DEVICE_NOTIFICATIONS +extern void post_device_notification(struct watch_notification *n, u64 id); +#else +static inline void post_device_notification(struct watch_notification *n, u64 id) {} +#endif + #ifndef dev_fmt #define dev_fmt(fmt) fmt #endif diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 88145da7d140..5bac5daec51e 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1000,6 +1000,7 @@ asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags) asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); +asmlinkage long sys_watch_devices(int watch_fd, int watch_id, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 1be0e798e362..fd63ff0196fd 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -850,9 +850,11 @@ __SYSCALL(__NR_pidfd_open, sys_pidfd_open) #define __NR_clone3 435 __SYSCALL(__NR_clone3, sys_clone3) #endif +#define __NR_watch_devices 436 +__SYSCALL(__NR_watch_devices, sys_watch_devices) #undef __NR_syscalls -#define __NR_syscalls 436 +#define __NR_syscalls 437 /* * 32 bit systems traditionally used different diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 34b76895b81e..184ad68c087f 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -51,6 +51,7 @@ COND_SYSCALL_COMPAT(io_pgetevents); COND_SYSCALL(io_uring_setup); COND_SYSCALL(io_uring_enter); COND_SYSCALL(io_uring_register); +COND_SYSCALL(watch_devices); /* fs/xattr.c */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:30 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 07/11] block: Add block layer notifications [ver #6] @ 2019-08-29 18:30 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:30 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a block layer notification mechanism whereby notifications about block-layer events such as I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report block notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_BLOCK_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); watch_devices(fd, 12); After that, records will be placed into the queue when, for example, errors occur on a block device. Records are of the following format: struct block_notification { struct watch_notification watch; __u64 dev; __u64 sector; } *n; Where: n->watch.type will be WATCH_TYPE_BLOCK_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to watch_devices(), shifted. n->dev will be the device numbers munged together. n->sector will indicate the affected sector (if appropriate for the event). Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> --- Documentation/watch_queue.rst | 4 +++- block/Kconfig | 9 +++++++++ block/blk-core.c | 29 +++++++++++++++++++++++++++++ include/linux/blkdev.h | 15 +++++++++++++++ include/uapi/linux/watch_queue.h | 30 +++++++++++++++++++++++++++++- 5 files changed, 85 insertions(+), 2 deletions(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 393905b904c8..5cc9c6924727 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -7,7 +7,9 @@ receive notifications from the kernel. This can be used in conjunction with:: * Key/keyring notifications - * General device event notifications + * General device event notifications, including:: + + * Block layer event notifications The notifications buffers can be enabled by: diff --git a/block/Kconfig b/block/Kconfig index 8b5f8e560eb4..cc93e4ca29a7 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -164,6 +164,15 @@ config BLK_SED_OPAL Enabling this option enables users to setup/unlock/lock Locking ranges for SED devices using the Opal protocol. +config BLK_NOTIFICATIONS + bool "Block layer event notifications" + depends on DEVICE_NOTIFICATIONS + help + This option provides support for getting block layer event + notifications. This makes use of the /dev/watch_queue misc device to + handle the notification buffer and provides the device_notify() system + call to enable/disable watches. + menu "Partition Types" source "block/partitions/Kconfig" diff --git a/block/blk-core.c b/block/blk-core.c index d0cc6e14d2f0..8ab1e07aa311 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -181,6 +181,22 @@ static const struct { [BLK_STS_IOERR] = { -EIO, "I/O" }, }; +#ifdef CONFIG_BLK_NOTIFICATIONS +static const +enum block_notification_type blk_notifications[ARRAY_SIZE(blk_errors)] = { + [BLK_STS_TIMEOUT] = NOTIFY_BLOCK_ERROR_TIMEOUT, + [BLK_STS_NOSPC] = NOTIFY_BLOCK_ERROR_NO_SPACE, + [BLK_STS_TRANSPORT] = NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT, + [BLK_STS_TARGET] = NOTIFY_BLOCK_ERROR_CRITICAL_TARGET, + [BLK_STS_NEXUS] = NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS, + [BLK_STS_MEDIUM] = NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM, + [BLK_STS_PROTECTION] = NOTIFY_BLOCK_ERROR_PROTECTION, + [BLK_STS_RESOURCE] = NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE, + [BLK_STS_DEV_RESOURCE] = NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE, + [BLK_STS_IOERR] = NOTIFY_BLOCK_ERROR_IO, +}; +#endif + blk_status_t errno_to_blk_status(int errno) { int i; @@ -221,6 +237,19 @@ static void print_req_error(struct request *req, blk_status_t status, req->cmd_flags & ~REQ_OP_MASK, req->nr_phys_segments, IOPRIO_PRIO_CLASS(req->ioprio)); + +#ifdef CONFIG_BLK_NOTIFICATIONS + if (blk_notifications[idx]) { + struct block_notification n = { + .watch.type = WATCH_TYPE_BLOCK_NOTIFY, + .watch.subtype = blk_notifications[idx], + .watch.info = watch_sizeof(n), + .dev = req->rq_disk ? disk_devt(req->rq_disk) : 0, + .sector = blk_rq_pos(req), + }; + post_block_notification(&n); + } +#endif } static void req_bio_endio(struct request *rq, struct bio *bio, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1ef375dafb1c..5d856f670a8f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/watch_queue.h> struct module; struct scsi_ioctl_command; @@ -1742,6 +1743,20 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq) } #endif /* CONFIG_BLK_DEV_ZONED */ +#ifdef CONFIG_BLK_NOTIFICATIONS +static inline void post_block_notification(struct block_notification *n) +{ + u64 id = 0; /* Might want to allow dev# here. */ + + post_device_notification(&n->watch, id); +} +#else +static inline void post_block_notification(struct block_notification *n) +{ +} +#endif + + #else /* CONFIG_BLOCK */ struct block_device; diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 654d4ba8b909..9a6c059af09d 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -11,7 +11,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ - WATCH_TYPE___NR = 2 + WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ + WATCH_TYPE___NR = 3 }; enum watch_meta_notification_subtype { @@ -124,4 +125,31 @@ struct key_notification { __u32 aux; /* Per-type auxiliary data */ }; +/* + * Type of block layer notification. + */ +enum block_notification_type { + NOTIFY_BLOCK_ERROR_TIMEOUT = 1, /* Timeout error */ + NOTIFY_BLOCK_ERROR_NO_SPACE = 2, /* Critical space allocation error */ + NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT = 3, /* Recoverable transport error */ + NOTIFY_BLOCK_ERROR_CRITICAL_TARGET = 4, /* Critical target error */ + NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS = 5, /* Critical nexus error */ + NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM = 6, /* Critical medium error */ + NOTIFY_BLOCK_ERROR_PROTECTION = 7, /* Protection error */ + NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE = 8, /* Kernel resource error */ + NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE = 9, /* Device resource error */ + NOTIFY_BLOCK_ERROR_IO = 10, /* Other I/O error */ +}; + +/* + * Block layer notification record. + * - watch.type = WATCH_TYPE_BLOCK_NOTIFY + * - watch.subtype = enum block_notification_type + */ +struct block_notification { + struct watch_notification watch; /* WATCH_TYPE_BLOCK_NOTIFY */ + __u64 dev; /* Device number */ + __u64 sector; /* Affected sector */ +}; + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:31 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 ++++++ drivers/usb/core/Kconfig | 9 ++++++ drivers/usb/core/devio.c | 56 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++ include/uapi/linux/watch_queue.h | 30 ++++++++++++++++++++ 6 files changed, 125 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =============== diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..b8572e4d6a1b 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..bc5183e10d8c 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,31 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_BUS_ADD = 2, /* USB bus added */ + NOTIFY_USB_BUS_REMOVE = 3, /* USB bus removed */ + NOTIFY_USB_DEVICE_RESET = 4, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 5, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #6] @ 2019-08-29 18:31 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 ++++++ drivers/usb/core/Kconfig | 9 ++++++ drivers/usb/core/devio.c | 56 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++ include/uapi/linux/watch_queue.h | 30 ++++++++++++++++++++ 6 files changed, 125 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =============== diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..b8572e4d6a1b 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..bc5183e10d8c 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,31 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_BUS_ADD = 2, /* USB bus added */ + NOTIFY_USB_BUS_REMOVE = 3, /* USB bus removed */ + NOTIFY_USB_DEVICE_RESET = 4, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 5, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 08/11] usb: Add USB subsystem notifications [ver #6] @ 2019-08-29 18:31 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Add a USB subsystem notification mechanism whereby notifications about hardware events such as device connection, disconnection, reset and I/O errors, can be reported to a monitoring process asynchronously. Firstly, an event queue needs to be created: fd = open("/dev/event_queue", O_RDWR); ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, page_size << n); then a notification can be set up to report USB notifications via that queue: struct watch_notification_filter filter = { .nr_filters = 1, .filters = { [0] = { .type = WATCH_TYPE_USB_NOTIFY, .subtype_filter[0] = UINT_MAX; }, }, }; ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter); notify_devices(fd, 12); After that, records will be placed into the queue when events occur on a USB device or bus. Records are of the following format: struct usb_notification { struct watch_notification watch; __u32 error; __u32 reserved; __u8 name_len; __u8 name[0]; } *n; Where: n->watch.type will be WATCH_TYPE_USB_NOTIFY n->watch.subtype will be the type of notification, such as NOTIFY_USB_DEVICE_ADD. n->watch.info & WATCH_INFO_LENGTH will indicate the length of the record. n->watch.info & WATCH_INFO_ID will be the second argument to device_notify(), shifted. n->error and n->reserved are intended to convey information such as error codes, but are currently not used n->name_len and n->name convey the USB device name as an unterminated string. This may be truncated - it is currently limited to a maximum 63 chars. Note that it is permissible for event records to be of variable length - or, at least, the length may be dependent on the subtype. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> cc: linux-usb@vger.kernel.org --- Documentation/watch_queue.rst | 9 ++++++ drivers/usb/core/Kconfig | 9 ++++++ drivers/usb/core/devio.c | 56 ++++++++++++++++++++++++++++++++++++++ drivers/usb/core/hub.c | 4 +++ include/linux/usb.h | 18 ++++++++++++ include/uapi/linux/watch_queue.h | 30 ++++++++++++++++++++ 6 files changed, 125 insertions(+), 1 deletion(-) diff --git a/Documentation/watch_queue.rst b/Documentation/watch_queue.rst index 5cc9c6924727..4087a8e670a8 100644 --- a/Documentation/watch_queue.rst +++ b/Documentation/watch_queue.rst @@ -11,6 +11,8 @@ receive notifications from the kernel. This can be used in conjunction with:: * Block layer event notifications + * USB subsystem event notifications + The notifications buffers can be enabled by: @@ -315,6 +317,13 @@ Any particular buffer can be fed from multiple sources. Sources include: or temporary link loss. Watches of this type are set on the global device watch list. + * WATCH_TYPE_USB_NOTIFY + + Notifications of this type indicate USB subsystem events, such as + attachment, removal, reset and I/O errors. Separate events are generated + for buses and devices. Watchpoints of this type are set on the global + device watch list. + Event Filtering =======diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig index ecaacc8ed311..57e7b649e48b 100644 --- a/drivers/usb/core/Kconfig +++ b/drivers/usb/core/Kconfig @@ -102,3 +102,12 @@ config USB_AUTOSUSPEND_DELAY The default value Linux has always had is 2 seconds. Change this value if you want a different delay and cannot modify the command line or module parameter. + +config USB_NOTIFICATIONS + bool "Provide USB hardware event notifications" + depends on USB && DEVICE_NOTIFICATIONS + help + This option provides support for getting hardware event notifications + on USB devices and interfaces. This makes use of the + /dev/watch_queue misc device to handle the notification buffer. + device_notify(2) is used to set/remove watches. diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 9063ede411ae..b8572e4d6a1b 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -41,6 +41,7 @@ #include <linux/dma-mapping.h> #include <asm/byteorder.h> #include <linux/moduleparam.h> +#include <linux/watch_queue.h> #include "usb.h" @@ -2660,13 +2661,68 @@ static void usbdev_remove(struct usb_device *udev) } } +#ifdef CONFIG_USB_NOTIFICATIONS +static noinline void post_usb_notification(const char *devname, + enum usb_notification_type subtype, + u32 error) +{ + unsigned int gran = WATCH_LENGTH_GRANULARITY; + unsigned int name_len, n_len; + u64 id = 0; /* Might want to put a dev# here. */ + + struct { + struct usb_notification n; + char more_name[USB_NOTIFICATION_MAX_NAME_LEN - + (sizeof(struct usb_notification) - + offsetof(struct usb_notification, name))]; + } n; + + name_len = strlen(devname); + name_len = min_t(size_t, name_len, USB_NOTIFICATION_MAX_NAME_LEN); + n_len = round_up(offsetof(struct usb_notification, name) + name_len, + gran) / gran; + + memset(&n, 0, sizeof(n)); + memcpy(n.n.name, devname, n_len); + + n.n.watch.type = WATCH_TYPE_USB_NOTIFY; + n.n.watch.subtype = subtype; + n.n.watch.info = n_len; + n.n.error = error; + n.n.name_len = name_len; + + post_device_notification(&n.n.watch, id); +} + +void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(dev_name(&udev->dev), subtype, error); +} + +void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, u32 error) +{ + post_usb_notification(ubus->bus_name, subtype, error); +} +#endif + static int usbdev_notify(struct notifier_block *self, unsigned long action, void *dev) { switch (action) { case USB_DEVICE_ADD: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_ADD, 0); break; case USB_DEVICE_REMOVE: + post_usb_device_notification(dev, NOTIFY_USB_DEVICE_REMOVE, 0); + usbdev_remove(dev); + break; + case USB_BUS_ADD: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_ADD, 0); + break; + case USB_BUS_REMOVE: + post_usb_bus_notification(dev, NOTIFY_USB_BUS_REMOVE, 0); usbdev_remove(dev); break; } diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 236313f41f4a..e8ebacc15a32 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -29,6 +29,7 @@ #include <linux/random.h> #include <linux/pm_qos.h> #include <linux/kobject.h> +#include <linux/watch_queue.h> #include <linux/uaccess.h> #include <asm/byteorder.h> @@ -4605,6 +4606,9 @@ hub_port_init(struct usb_hub *hub, struct usb_device *udev, int port1, (udev->config) ? "reset" : "new", speed, devnum, driver_name); + if (udev->config) + post_usb_device_notification(udev, NOTIFY_USB_DEVICE_RESET, 0); + /* Set up TT records, if needed */ if (hdev->tt) { udev->tt = hdev->tt; diff --git a/include/linux/usb.h b/include/linux/usb.h index e87826e23d59..ddfb9dc2473e 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -26,6 +26,7 @@ struct usb_device; struct usb_driver; struct wusb_dev; +enum usb_notification_type; /*-------------------------------------------------------------------------*/ @@ -2010,6 +2011,23 @@ extern void usb_led_activity(enum usb_led_event ev); static inline void usb_led_activity(enum usb_led_event ev) {} #endif +/* + * Notification functions. + */ +#ifdef CONFIG_USB_NOTIFICATIONS +extern void post_usb_device_notification(const struct usb_device *udev, + enum usb_notification_type subtype, + u32 error); +extern void post_usb_bus_notification(const struct usb_bus *ubus, + enum usb_notification_type subtype, + u32 error); +#else +static inline void post_usb_device_notification(const struct usb_device *udev, + unsigned int subtype, u32 error) {} +static inline void post_usb_bus_notification(const struct usb_bus *ubus, + unsigned int subtype, u32 error) {} +#endif + #endif /* __KERNEL__ */ #endif diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h index 9a6c059af09d..bc5183e10d8c 100644 --- a/include/uapi/linux/watch_queue.h +++ b/include/uapi/linux/watch_queue.h @@ -12,7 +12,8 @@ enum watch_notification_type { WATCH_TYPE_META = 0, /* Special record */ WATCH_TYPE_KEY_NOTIFY = 1, /* Key change event notification */ WATCH_TYPE_BLOCK_NOTIFY = 2, /* Block layer event notification */ - WATCH_TYPE___NR = 3 + WATCH_TYPE_USB_NOTIFY = 3, /* USB subsystem event notification */ + WATCH_TYPE___NR = 4 }; enum watch_meta_notification_subtype { @@ -152,4 +153,31 @@ struct block_notification { __u64 sector; /* Affected sector */ }; +/* + * Type of USB layer notification. + */ +enum usb_notification_type { + NOTIFY_USB_DEVICE_ADD = 0, /* USB device added */ + NOTIFY_USB_DEVICE_REMOVE = 1, /* USB device removed */ + NOTIFY_USB_BUS_ADD = 2, /* USB bus added */ + NOTIFY_USB_BUS_REMOVE = 3, /* USB bus removed */ + NOTIFY_USB_DEVICE_RESET = 4, /* USB device reset */ + NOTIFY_USB_DEVICE_ERROR = 5, /* USB device error */ +}; + +/* + * USB subsystem notification record. + * - watch.type = WATCH_TYPE_USB_NOTIFY + * - watch.subtype = enum usb_notification_type + */ +struct usb_notification { + struct watch_notification watch; /* WATCH_TYPE_USB_NOTIFY */ + __u32 error; + __u32 reserved; + __u8 name_len; /* Length of device name */ + __u8 name[0]; /* Device name (padded to __u64, truncated at 63 chars) */ +}; + +#define USB_NOTIFICATION_MAX_NAME_LEN 63 + #endif /* _UAPI_LINUX_WATCH_QUEUE_H */ ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:31 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m=20003 NOTIFY[00000004-00000002] ty=0003 sy=0002 i=01000010 KEY 2ffc2e5d change=2[linked] aux=1035096409 ptrs h=6 t=4 m=20003 NOTIFY[00000006-00000004] ty=0003 sy=0003 i=01000010 KEY 2ffc2e5d change=3[unlinked] aux=1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m=6000004 NOTIFY[00000005-00000002] ty=0004 sy=0006 i=04000018 BLOCK 00800050 e=6[critical medium] s=5be8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 233 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 248 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..6cd7101cb28c --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_BUS_ADD] = "bus-add", + [NOTIFY_USB_BUS_REMOVE] = "bus-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) == 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype == WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd == -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) == -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) == -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) == -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) == -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #6] @ 2019-08-29 18:31 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m=20003 NOTIFY[00000004-00000002] ty=0003 sy=0002 i=01000010 KEY 2ffc2e5d change=2[linked] aux=1035096409 ptrs h=6 t=4 m=20003 NOTIFY[00000006-00000004] ty=0003 sy=0003 i=01000010 KEY 2ffc2e5d change=3[unlinked] aux=1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m=6000004 NOTIFY[00000005-00000002] ty=0004 sy=0006 i=04000018 BLOCK 00800050 e=6[critical medium] s=5be8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 233 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 248 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..6cd7101cb28c --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_BUS_ADD] = "bus-add", + [NOTIFY_USB_BUS_REMOVE] = "bus-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) == -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) == 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype == WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd == -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) == -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) == -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf == MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) == -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) == -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 09/11] Add sample notification program [ver #6] @ 2019-08-29 18:31 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner This needs to be linked with -lkeyutils. It is run like: ./watch_test and watches "/" for mount changes and the current session keyring for key changes: # keyctl add user a a @s 1035096409 # keyctl unlink 1035096409 @s producing: # ./watch_test ptrs h=4 t=2 m 003 NOTIFY[00000004-00000002] ty\003 sy\002 i\x01000010 KEY 2ffc2e5d change=2[linked] aux\x1035096409 ptrs h=6 t=4 m 003 NOTIFY[00000006-00000004] ty\003 sy\003 i\x01000010 KEY 2ffc2e5d change=3[unlinked] aux\x1035096409 Other events may be produced, such as with a failing disk: ptrs h=5 t=2 m`00004 NOTIFY[00000005-00000002] ty\004 sy\006 i\x04000018 BLOCK 00800050 e=6[critical medium] s[e8 This corresponds to: print_req_error: critical medium error, dev sdf, sector 23528 flags 0 in dmesg. Signed-off-by: David Howells <dhowells@redhat.com> --- samples/Kconfig | 6 + samples/Makefile | 1 samples/watch_queue/Makefile | 8 + samples/watch_queue/watch_test.c | 233 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 248 insertions(+) create mode 100644 samples/watch_queue/Makefile create mode 100644 samples/watch_queue/watch_test.c diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4dda80..2c3e07addd38 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,10 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_WATCH_QUEUE + bool "Build example /dev/watch_queue notification consumer" + help + Build example userspace program to use the new mount_notify(), + sb_notify() syscalls and the KEYCTL_WATCH_KEY keyctl() function. + endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index 7d6e4ca28d69..a61a39047d02 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -20,3 +20,4 @@ obj-$(CONFIG_SAMPLE_TRACE_PRINTK) += trace_printk/ obj-$(CONFIG_VIDEO_PCI_SKELETON) += v4l/ obj-y += vfio-mdev/ subdir-$(CONFIG_SAMPLE_VFS) += vfs +subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile new file mode 100644 index 000000000000..6ee61e3ca8d2 --- /dev/null +++ b/samples/watch_queue/Makefile @@ -0,0 +1,8 @@ +# List of programs to build +hostprogs-y := watch_test + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_watch_test.o += -I$(objtree)/usr/include +HOSTLDLIBS_watch_test += -lkeyutils diff --git a/samples/watch_queue/watch_test.c b/samples/watch_queue/watch_test.c new file mode 100644 index 000000000000..6cd7101cb28c --- /dev/null +++ b/samples/watch_queue/watch_test.c @@ -0,0 +1,233 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Use /dev/watch_queue to watch for notifications. + * + * Copyright (C) 2019 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <stdbool.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <signal.h> +#include <unistd.h> +#include <fcntl.h> +#include <dirent.h> +#include <errno.h> +#include <sys/wait.h> +#include <sys/ioctl.h> +#include <sys/mman.h> +#include <poll.h> +#include <limits.h> +#include <linux/watch_queue.h> +#include <linux/unistd.h> +#include <linux/keyctl.h> + +#ifndef KEYCTL_WATCH_KEY +#define KEYCTL_WATCH_KEY -1 +#endif +#ifndef __NR_watch_devices +#define __NR_watch_devices -1 +#endif + +#define BUF_SIZE 4 + +static long keyctl_watch_key(int key, int watch_fd, int watch_id) +{ + return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id); +} + +static const char *key_subtypes[256] = { + [NOTIFY_KEY_INSTANTIATED] = "instantiated", + [NOTIFY_KEY_UPDATED] = "updated", + [NOTIFY_KEY_LINKED] = "linked", + [NOTIFY_KEY_UNLINKED] = "unlinked", + [NOTIFY_KEY_CLEARED] = "cleared", + [NOTIFY_KEY_REVOKED] = "revoked", + [NOTIFY_KEY_INVALIDATED] = "invalidated", + [NOTIFY_KEY_SETATTR] = "setattr", +}; + +static void saw_key_change(struct watch_notification *n) +{ + struct key_notification *k = (struct key_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len != sizeof(struct key_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("KEY %08x change=%u[%s] aux=%u\n", + k->key_id, n->subtype, key_subtypes[n->subtype], k->aux); +} + +static const char *block_subtypes[256] = { + [NOTIFY_BLOCK_ERROR_TIMEOUT] = "timeout", + [NOTIFY_BLOCK_ERROR_NO_SPACE] = "critical space allocation", + [NOTIFY_BLOCK_ERROR_RECOVERABLE_TRANSPORT] = "recoverable transport", + [NOTIFY_BLOCK_ERROR_CRITICAL_TARGET] = "critical target", + [NOTIFY_BLOCK_ERROR_CRITICAL_NEXUS] = "critical nexus", + [NOTIFY_BLOCK_ERROR_CRITICAL_MEDIUM] = "critical medium", + [NOTIFY_BLOCK_ERROR_PROTECTION] = "protection", + [NOTIFY_BLOCK_ERROR_KERNEL_RESOURCE] = "kernel resource", + [NOTIFY_BLOCK_ERROR_DEVICE_RESOURCE] = "device resource", + [NOTIFY_BLOCK_ERROR_IO] = "I/O", +}; + +static void saw_block_change(struct watch_notification *n) +{ + struct block_notification *b = (struct block_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct block_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("BLOCK %08llx e=%u[%s] s=%llx\n", + (unsigned long long)b->dev, + n->subtype, block_subtypes[n->subtype], + (unsigned long long)b->sector); +} + +static const char *usb_subtypes[256] = { + [NOTIFY_USB_DEVICE_ADD] = "dev-add", + [NOTIFY_USB_DEVICE_REMOVE] = "dev-remove", + [NOTIFY_USB_BUS_ADD] = "bus-add", + [NOTIFY_USB_BUS_REMOVE] = "bus-remove", + [NOTIFY_USB_DEVICE_RESET] = "dev-reset", + [NOTIFY_USB_DEVICE_ERROR] = "dev-error", +}; + +static void saw_usb_event(struct watch_notification *n) +{ + struct usb_notification *u = (struct usb_notification *)n; + unsigned int len = (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + + if (len < sizeof(struct usb_notification) / WATCH_LENGTH_GRANULARITY) + return; + + printf("USB %*.*s %s e=%x r=%x\n", + u->name_len, u->name_len, u->name, + usb_subtypes[n->subtype], + u->error, u->reserved); +} + +/* + * Consume and display events. + */ +static int consumer(int fd, struct watch_queue_buffer *buf) +{ + struct watch_notification *n; + struct pollfd p[1]; + unsigned int head, tail, mask = buf->meta.mask; + + for (;;) { + p[0].fd = fd; + p[0].events = POLLIN | POLLERR; + p[0].revents = 0; + + if (poll(p, 1, -1) = -1) { + perror("poll"); + break; + } + + printf("ptrs h=%x t=%x m=%x\n", + buf->meta.head, buf->meta.tail, buf->meta.mask); + + while (head = __atomic_load_n(&buf->meta.head, __ATOMIC_ACQUIRE), + tail = buf->meta.tail, + tail != head + ) { + n = &buf->slots[tail & mask]; + printf("NOTIFY[%08x-%08x] ty=%04x sy=%04x i=%08x\n", + head, tail, n->type, n->subtype, n->info); + if ((n->info & WATCH_INFO_LENGTH) = 0) + goto out; + + switch (n->type) { + case WATCH_TYPE_META: + if (n->subtype = WATCH_META_REMOVAL_NOTIFICATION) + printf("REMOVAL of watchpoint %08x\n", + (n->info & WATCH_INFO_ID) >> + WATCH_INFO_ID__SHIFT); + break; + case WATCH_TYPE_KEY_NOTIFY: + saw_key_change(n); + break; + case WATCH_TYPE_BLOCK_NOTIFY: + saw_block_change(n); + break; + case WATCH_TYPE_USB_NOTIFY: + saw_usb_event(n); + break; + } + + tail += (n->info & WATCH_INFO_LENGTH) >> WATCH_INFO_LENGTH__SHIFT; + __atomic_store_n(&buf->meta.tail, tail, __ATOMIC_RELEASE); + } + } + +out: + return 0; +} + +static struct watch_notification_filter filter = { + .nr_filters = 3, + .__reserved = 0, + .filters = { + [0] = { + .type = WATCH_TYPE_KEY_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [1] = { + .type = WATCH_TYPE_BLOCK_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + [2] = { + .type = WATCH_TYPE_USB_NOTIFY, + .subtype_filter[0] = UINT_MAX, + }, + }, +}; + +int main(int argc, char **argv) +{ + struct watch_queue_buffer *buf; + size_t page_size; + int fd; + + fd = open("/dev/watch_queue", O_RDWR); + if (fd = -1) { + perror("/dev/watch_queue"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) = -1) { + perror("/dev/watch_queue(size)"); + exit(1); + } + + if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) = -1) { + perror("/dev/watch_queue(filter)"); + exit(1); + } + + page_size = sysconf(_SC_PAGESIZE); + buf = mmap(NULL, BUF_SIZE * page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (buf = MAP_FAILED) { + perror("mmap"); + exit(1); + } + + if (keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fd, 0x01) = -1) { + perror("keyctl"); + exit(1); + } + + if (syscall(__NR_watch_devices, fd, 0x04, 0) = -1) { + perror("watch_devices"); + exit(1); + } + + return consumer(fd, buf); +} ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:31 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- security/selinux/hooks.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..371f2ebc879b 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,20 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct watch *watch, struct key *key) +{ + struct key_security_struct *ksec; + u32 sid; + + sid = cred_sid(watch->cred); + ksec = key->security; + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6979,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #6] @ 2019-08-29 18:31 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- security/selinux/hooks.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..371f2ebc879b 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,20 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct watch *watch, struct key *key) +{ + struct key_security_struct *ksec; + u32 sid; + + sid = cred_sid(watch->cred); + ksec = key->security; + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6979,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 10/11] selinux: Implement the watch_key security hook [ver #6] @ 2019-08-29 18:31 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook to make sure that a key grants the caller View permission in order to set a watch on a key. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- security/selinux/hooks.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 74dd46de01b6..371f2ebc879b 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6533,6 +6533,20 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) *_buffer = context; return rc; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +static int selinux_watch_key(struct watch *watch, struct key *key) +{ + struct key_security_struct *ksec; + u32 sid; + + sid = cred_sid(watch->cred); + ksec = key->security; + + return avc_has_perm(&selinux_state, + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); +} +#endif #endif #ifdef CONFIG_SECURITY_INFINIBAND @@ -6965,6 +6979,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, selinux_key_free), LSM_HOOK_INIT(key_permission, selinux_key_permission), LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, selinux_watch_key), +#endif #endif #ifdef CONFIG_AUDIT ^ permalink raw reply related [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #6] 2019-08-29 18:31 ` David Howells @ 2019-08-29 18:44 ` Stephen Smalley -1 siblings, 0 replies; 234+ messages in thread From: Stephen Smalley @ 2019-08-29 18:44 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On 8/29/19 2:31 PM, David Howells wrote: > Implement the watch_key security hook to make sure that a key grants the > caller View permission in order to set a watch on a key. > > For the moment, the watch_devices security hook is left unimplemented as > it's not obvious what the object should be since the queue is global and > didn't previously exist. > > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > security/selinux/hooks.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 74dd46de01b6..371f2ebc879b 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -6533,6 +6533,20 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) > *_buffer = context; > return rc; > } > + > +#ifdef CONFIG_KEY_NOTIFICATIONS > +static int selinux_watch_key(struct watch *watch, struct key *key) > +{ > + struct key_security_struct *ksec; > + u32 sid; > + > + sid = cred_sid(watch->cred); Can watch->cred ever differ from current's cred here? If not, why can't we just use current_sid() here and why do we need the watch object at all? > + ksec = key->security; > + > + return avc_has_perm(&selinux_state, > + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); > +} > +#endif > #endif > > #ifdef CONFIG_SECURITY_INFINIBAND > @@ -6965,6 +6979,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(key_free, selinux_key_free), > LSM_HOOK_INIT(key_permission, selinux_key_permission), > LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), > +#ifdef CONFIG_KEY_NOTIFICATIONS > + LSM_HOOK_INIT(watch_key, selinux_watch_key), > +#endif > #endif > > #ifdef CONFIG_AUDIT > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #6] @ 2019-08-29 18:44 ` Stephen Smalley 0 siblings, 0 replies; 234+ messages in thread From: Stephen Smalley @ 2019-08-29 18:44 UTC (permalink / raw) To: David Howells, viro Cc: Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel On 8/29/19 2:31 PM, David Howells wrote: > Implement the watch_key security hook to make sure that a key grants the > caller View permission in order to set a watch on a key. > > For the moment, the watch_devices security hook is left unimplemented as > it's not obvious what the object should be since the queue is global and > didn't previously exist. > > Signed-off-by: David Howells <dhowells@redhat.com> > --- > > security/selinux/hooks.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 74dd46de01b6..371f2ebc879b 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -6533,6 +6533,20 @@ static int selinux_key_getsecurity(struct key *key, char **_buffer) > *_buffer = context; > return rc; > } > + > +#ifdef CONFIG_KEY_NOTIFICATIONS > +static int selinux_watch_key(struct watch *watch, struct key *key) > +{ > + struct key_security_struct *ksec; > + u32 sid; > + > + sid = cred_sid(watch->cred); Can watch->cred ever differ from current's cred here? If not, why can't we just use current_sid() here and why do we need the watch object at all? > + ksec = key->security; > + > + return avc_has_perm(&selinux_state, > + sid, ksec->sid, SECCLASS_KEY, KEY_NEED_VIEW, NULL); > +} > +#endif > #endif > > #ifdef CONFIG_SECURITY_INFINIBAND > @@ -6965,6 +6979,9 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(key_free, selinux_key_free), > LSM_HOOK_INIT(key_permission, selinux_key_permission), > LSM_HOOK_INIT(key_getsecurity, selinux_key_getsecurity), > +#ifdef CONFIG_KEY_NOTIFICATIONS > + LSM_HOOK_INIT(watch_key, selinux_watch_key), > +#endif > #endif > > #ifdef CONFIG_AUDIT > ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #6] 2019-08-29 18:31 ` David Howells @ 2019-08-29 19:11 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 19:11 UTC (permalink / raw) To: Stephen Smalley Cc: dhowells, viro, Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Stephen Smalley <sds@tycho.nsa.gov> wrote: > Can watch->cred ever differ from current's cred here? If not, why can't we > just use current_sid() here Um. Not currently. I'm not sure whether its ever likely to be otherwise. Probably we could just use that and fix it up later if we do find otherwise. > and why do we need the watch object at all? It carries more than just the creds for the caller of keyctl_watch_key(), it also carries information about the queue to which notifications will be written, including the creds that were active when that was set up. Note that there's no requirement that the process that opened /dev/watch_queue be the one that sets the watch. In the keyutils testsuite, I 'leak' a file descriptor from the session wrangler into the program that it runs so that tests running inside the test script can add watches to it. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* Re: [PATCH 10/11] selinux: Implement the watch_key security hook [ver #6] @ 2019-08-29 19:11 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 19:11 UTC (permalink / raw) To: Stephen Smalley Cc: dhowells, viro, Casey Schaufler, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-kernel Stephen Smalley <sds@tycho.nsa.gov> wrote: > Can watch->cred ever differ from current's cred here? If not, why can't we > just use current_sid() here Um. Not currently. I'm not sure whether its ever likely to be otherwise. Probably we could just use that and fix it up later if we do find otherwise. > and why do we need the watch object at all? It carries more than just the creds for the caller of keyctl_watch_key(), it also carries information about the queue to which notifications will be written, including the creds that were active when that was set up. Note that there's no requirement that the process that opened /dev/watch_queue be the one that sets the watch. In the keyutils testsuite, I 'leak' a file descriptor from the session wrangler into the program that it runs so that tests running inside the test script can add watches to it. David ^ permalink raw reply [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #6] 2019-08-29 18:29 ` David Howells (?) @ 2019-08-29 18:31 ` David Howells -1 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner, dhowells, keyrings, linux-usb, linux-security-module, linux-fsdevel, linux-api, linux-block, linux-security-module, linux-kernel Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 81 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..ef3fb70649f0 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4320,8 +4320,82 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @watch: The watch to be set + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct watch *watch, struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_task(smack_cred(watch->cred)); + int rc; + + if (key == NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security == NULL) + return 0; + /* + * This should not occur + */ + if (tkp == NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, watch->cred)) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type == WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4784,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #6] @ 2019-08-29 18:31 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 81 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..ef3fb70649f0 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4320,8 +4320,82 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @watch: The watch to be set + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct watch *watch, struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_task(smack_cred(watch->cred)); + int rc; + + if (key == NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security == NULL) + return 0; + /* + * This should not occur + */ + if (tkp == NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, watch->cred)) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type == WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4784,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
* [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] [ver #6] @ 2019-08-29 18:31 ` David Howells 0 siblings, 0 replies; 234+ messages in thread From: David Howells @ 2019-08-29 18:31 UTC (permalink / raw) To: viro Cc: dhowells, Casey Schaufler, Stephen Smalley, Greg Kroah-Hartman, nicolas.dichtel, raven, Christian Brauner Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <dhowells@redhat.com> --- include/linux/lsm_audit.h | 1 + security/smack/smack_lsm.c | 81 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+) diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 915330abf6e5..734d67889826 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -74,6 +74,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_FILE 12 #define LSM_AUDIT_DATA_IBPKEY 13 #define LSM_AUDIT_DATA_IBENDPORT 14 +#define LSM_AUDIT_DATA_NOTIFICATION 15 union { struct path path; struct dentry *dentry; diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4c5e5a438f8b..ef3fb70649f0 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4320,8 +4320,82 @@ static int smack_key_getsecurity(struct key *key, char **_buffer) return length; } + +#ifdef CONFIG_KEY_NOTIFICATIONS +/** + * smack_watch_key - Smack access to watch a key for notifications. + * @watch: The watch to be set + * @key: The key to be watched + * + * Return 0 if the @watch->cred has permission to read from the key object and + * an error otherwise. + */ +static int smack_watch_key(struct watch *watch, struct key *key) +{ + struct smk_audit_info ad; + struct smack_known *tkp = smk_of_task(smack_cred(watch->cred)); + int rc; + + if (key = NULL) + return -EINVAL; + /* + * If the key hasn't been initialized give it access so that + * it may do so. + */ + if (key->security = NULL) + return 0; + /* + * This should not occur + */ + if (tkp = NULL) + return -EACCES; + + if (smack_privileged_cred(CAP_MAC_OVERRIDE, watch->cred)) + return 0; + +#ifdef CONFIG_AUDIT + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_KEY); + ad.a.u.key_struct.key = key->serial; + ad.a.u.key_struct.key_desc = key->description; +#endif + rc = smk_access(tkp, key->security, MAY_READ, &ad); + rc = smk_bu_note("key watch", tkp, key->security, MAY_READ, rc); + return rc; +} +#endif /* CONFIG_KEY_NOTIFICATIONS */ #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE +/** + * smack_post_notification - Smack access to post a notification to a queue + * @w_cred: The credentials of the watcher. + * @cred: The credentials of the event source (may be NULL). + * @n: The notification message to be posted. + */ +static int smack_post_notification(const struct cred *w_cred, + const struct cred *cred, + struct watch_notification *n) +{ + struct smk_audit_info ad; + struct smack_known *subj, *obj; + int rc; + + /* Always let maintenance notifications through. */ + if (n->type = WATCH_TYPE_META) + return 0; + + if (!cred) + return 0; + subj = smk_of_task(smack_cred(cred)); + obj = smk_of_task(smack_cred(w_cred)); + + smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_NOTIFICATION); + rc = smk_access(subj, obj, MAY_WRITE, &ad); + rc = smk_bu_note("notification", subj, obj, MAY_WRITE, rc); + return rc; +} +#endif /* CONFIG_WATCH_QUEUE */ + /* * Smack Audit hooks * @@ -4710,8 +4784,15 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(key_free, smack_key_free), LSM_HOOK_INIT(key_permission, smack_key_permission), LSM_HOOK_INIT(key_getsecurity, smack_key_getsecurity), +#ifdef CONFIG_KEY_NOTIFICATIONS + LSM_HOOK_INIT(watch_key, smack_watch_key), +#endif #endif /* CONFIG_KEYS */ +#ifdef CONFIG_WATCH_QUEUE + LSM_HOOK_INIT(post_notification, smack_post_notification), +#endif + /* Audit hooks */ #ifdef CONFIG_AUDIT LSM_HOOK_INIT(audit_rule_init, smack_audit_rule_init), ^ permalink raw reply related [flat|nested] 234+ messages in thread
end of thread, other threads:[~2019-09-06 21:19 UTC | newest] Thread overview: 234+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-04 22:15 [PATCH 00/11] Keyrings, Block and USB notifications [ver #8] David Howells 2019-09-04 22:15 ` David Howells 2019-09-04 22:15 ` David Howells 2019-09-04 22:15 ` [PATCH 01/11] uapi: General notification ring definitions " David Howells 2019-09-04 22:15 ` David Howells 2019-09-04 22:15 ` David Howells 2019-09-04 22:16 ` [PATCH 02/11] security: Add hooks to rule on setting a watch " David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` [PATCH 03/11] security: Add a hook for the point of notification insertion " David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` [PATCH 04/11] General notification queue with user mmap()'able ring buffer " David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` [PATCH 05/11] keys: Add a notification facility " David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` [PATCH 06/11] Add a general, global device notification watch list " David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` [PATCH 07/11] block: Add block layer notifications " David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` [PATCH 08/11] usb: Add USB subsystem " David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:16 ` David Howells 2019-09-04 22:17 ` [PATCH 09/11] Add sample notification program " David Howells 2019-09-04 22:17 ` David Howells 2019-09-04 22:17 ` David Howells 2019-09-04 22:17 ` [PATCH 10/11] selinux: Implement the watch_key security hook " David Howells 2019-09-04 22:17 ` David Howells 2019-09-04 22:17 ` David Howells 2019-09-04 22:17 ` [PATCH 11/11] smack: Implement the watch_key and post_notification hooks " David Howells 2019-09-04 22:17 ` David Howells 2019-09-04 22:17 ` David Howells 2019-09-04 22:28 ` [PATCH 00/11] Keyrings, Block and USB notifications " Linus Torvalds 2019-09-04 22:28 ` Linus Torvalds 2019-09-05 17:01 ` Why add the general notification queue and its sources David Howells 2019-09-05 17:01 ` David Howells 2019-09-05 17:19 ` Linus Torvalds 2019-09-05 17:19 ` Linus Torvalds 2019-09-05 18:32 ` Ray Strode 2019-09-05 18:32 ` Ray Strode 2019-09-05 18:32 ` Ray Strode 2019-09-05 20:39 ` Linus Torvalds 2019-09-05 20:39 ` Linus Torvalds 2019-09-06 19:32 ` Ray Strode 2019-09-06 19:32 ` Ray Strode 2019-09-06 19:41 ` Ray Strode 2019-09-06 19:41 ` Ray Strode 2019-09-06 19:53 ` Robbie Harwood 2019-09-06 19:53 ` Robbie Harwood 2019-09-06 19:53 ` Robbie Harwood 2019-09-05 21:32 ` David Howells 2019-09-05 21:32 ` David Howells 2019-09-05 22:08 ` Linus Torvalds 2019-09-05 22:08 ` Linus Torvalds 2019-09-05 23:18 ` David Howells 2019-09-05 23:18 ` David Howells 2019-09-06 0:07 ` Linus Torvalds 2019-09-06 0:07 ` Linus Torvalds 2019-09-06 10:09 ` David Howells 2019-09-06 10:09 ` David Howells 2019-09-06 15:35 ` Linus Torvalds 2019-09-06 15:35 ` Linus Torvalds 2019-09-06 15:53 ` Linus Torvalds 2019-09-06 15:53 ` Linus Torvalds 2019-09-06 16:12 ` Steven Whitehouse 2019-09-06 16:12 ` Steven Whitehouse 2019-09-06 17:07 ` Linus Torvalds 2019-09-06 17:07 ` Linus Torvalds 2019-09-06 17:14 ` Linus Torvalds 2019-09-06 17:14 ` Linus Torvalds 2019-09-06 21:19 ` David Howells 2019-09-06 21:19 ` David Howells 2019-09-06 17:14 ` Andy Lutomirski 2019-09-06 17:14 ` Andy Lutomirski 2019-09-05 18:37 ` Steven Whitehouse 2019-09-05 18:37 ` Steven Whitehouse 2019-09-05 18:51 ` Ray Strode 2019-09-05 18:51 ` Ray Strode 2019-09-05 20:09 ` David Lehman 2019-09-05 20:09 ` David Lehman 2019-09-05 18:33 ` Greg Kroah-Hartman 2019-09-05 18:33 ` Greg Kroah-Hartman [not found] <20190903085706.7700-1-hdanton@sina.com> 2019-08-30 13:57 ` [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` [PATCH 01/11] uapi: General notification ring definitions " David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` [PATCH 02/11] security: Add hooks to rule on setting a watch " David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` [PATCH 03/11] security: Add a hook for the point of notification insertion " David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` [PATCH 04/11] General notification queue with user mmap()'able ring buffer " David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` [PATCH 05/11] keys: Add a notification facility " David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:57 ` David Howells 2019-08-30 13:58 ` [PATCH 06/11] Add a general, global device notification watch list " David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 13:58 ` David Howells 2019-09-03 8:34 ` Yoshihiro Shimoda 2019-09-03 8:34 ` Yoshihiro Shimoda 2019-09-03 8:34 ` Yoshihiro Shimoda 2019-09-03 16:41 ` David Howells 2019-09-03 16:41 ` David Howells 2019-09-03 16:41 ` David Howells 2019-08-30 13:58 ` [PATCH 07/11] block: Add block layer notifications " David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 13:58 ` [PATCH 08/11] usb: Add USB subsystem " David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 13:58 ` David Howells 2019-09-03 8:53 ` Yoshihiro Shimoda 2019-09-03 8:53 ` Yoshihiro Shimoda 2019-09-03 8:53 ` Yoshihiro Shimoda 2019-09-03 9:37 ` Greg Kroah-Hartman 2019-09-03 9:37 ` Greg Kroah-Hartman 2019-09-03 9:37 ` Greg Kroah-Hartman 2019-09-04 1:53 ` Yoshihiro Shimoda 2019-09-04 1:53 ` Yoshihiro Shimoda 2019-09-04 1:53 ` Yoshihiro Shimoda 2019-09-03 12:51 ` Guenter Roeck 2019-09-03 12:51 ` Guenter Roeck 2019-09-03 16:07 ` David Howells 2019-09-03 16:07 ` David Howells 2019-09-03 16:12 ` Guenter Roeck 2019-09-03 16:12 ` Guenter Roeck 2019-09-03 16:29 ` David Howells 2019-09-03 16:29 ` David Howells 2019-09-03 17:06 ` Alan Stern 2019-09-03 17:06 ` Alan Stern 2019-09-03 17:06 ` Alan Stern 2019-09-03 17:17 ` Alan Stern 2019-09-03 17:17 ` Alan Stern 2019-09-03 17:17 ` Alan Stern 2019-09-04 15:17 ` David Howells 2019-09-04 15:17 ` David Howells 2019-08-30 13:58 ` [PATCH 09/11] Add sample notification program " David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 13:58 ` [PATCH 10/11] selinux: Implement the watch_key security hook " David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 14:15 ` Stephen Smalley 2019-08-30 14:15 ` Stephen Smalley 2019-08-30 14:23 ` David Howells 2019-08-30 14:23 ` David Howells 2019-08-30 14:41 ` David Howells 2019-08-30 14:41 ` David Howells 2019-08-30 15:41 ` Stephen Smalley 2019-08-30 15:41 ` Stephen Smalley 2019-08-30 13:58 ` [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] " David Howells 2019-08-30 13:58 ` David Howells 2019-08-30 13:58 ` David Howells 2019-09-03 15:20 ` Casey Schaufler 2019-09-03 15:20 ` Casey Schaufler 2019-09-03 15:41 ` David Howells 2019-09-03 15:41 ` David Howells 2019-09-03 17:40 ` Casey Schaufler 2019-09-03 17:40 ` Casey Schaufler 2019-09-03 18:06 ` David Howells 2019-09-03 18:06 ` David Howells 2019-09-03 22:16 ` Casey Schaufler 2019-09-03 22:16 ` Casey Schaufler 2019-09-03 22:39 ` David Howells 2019-09-03 22:39 ` David Howells 2019-09-04 12:08 ` David Howells 2019-09-04 12:08 ` David Howells 2019-09-04 14:56 ` Casey Schaufler 2019-09-04 14:56 ` Casey Schaufler 2019-08-30 14:15 ` watch_queue(7) manpage David Howells 2019-08-30 14:15 ` David Howells 2019-08-30 14:15 ` David Howells 2019-08-30 14:15 ` watch_devices(2) manpage David Howells 2019-08-30 14:15 ` David Howells 2019-08-30 14:16 ` keyctl_watch_key.3 manpage David Howells 2019-08-30 14:16 ` David Howells 2019-08-30 22:09 ` [PATCH 00/11] Keyrings, Block and USB notifications [ver #7] Casey Schaufler 2019-08-30 22:09 ` Casey Schaufler 2019-09-02 12:39 ` David Howells 2019-09-02 12:39 ` David Howells 2019-09-02 13:26 ` David Howells 2019-09-02 13:26 ` David Howells 2019-09-03 16:06 ` [PATCH 04/11] General notification queue with user mmap()'able ring buffer " David Howells 2019-09-03 16:06 ` David Howells 2019-09-03 16:37 ` David Howells 2019-09-03 16:37 ` David Howells -- strict thread matches above, loose matches on Subject: below -- 2019-08-29 18:29 [PATCH 00/11] Keyrings, Block and USB notifications [ver #6] David Howells 2019-08-29 18:29 ` David Howells 2019-08-29 18:29 ` David Howells 2019-08-29 18:29 ` [PATCH 01/11] uapi: General notification ring definitions " David Howells 2019-08-29 18:29 ` David Howells 2019-08-29 18:29 ` David Howells 2019-08-29 18:30 ` [PATCH 02/11] security: Add hooks to rule on setting a watch " David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` [PATCH 03/11] security: Add a hook for the point of notification insertion " David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` [PATCH 04/11] General notification queue with user mmap()'able ring buffer " David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` [PATCH 05/11] keys: Add a notification facility " David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` [PATCH 06/11] Add a general, global device notification watch list " David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` [PATCH 07/11] block: Add block layer notifications " David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:30 ` David Howells 2019-08-29 18:31 ` [PATCH 08/11] usb: Add USB subsystem " David Howells 2019-08-29 18:31 ` David Howells 2019-08-29 18:31 ` David Howells 2019-08-29 18:31 ` [PATCH 09/11] Add sample notification program " David Howells 2019-08-29 18:31 ` David Howells 2019-08-29 18:31 ` David Howells 2019-08-29 18:31 ` [PATCH 10/11] selinux: Implement the watch_key security hook " David Howells 2019-08-29 18:31 ` David Howells 2019-08-29 18:31 ` David Howells 2019-08-29 18:44 ` Stephen Smalley 2019-08-29 18:44 ` Stephen Smalley 2019-08-29 19:11 ` David Howells 2019-08-29 19:11 ` David Howells 2019-08-29 18:31 ` [PATCH 11/11] smack: Implement the watch_key and post_notification hooks [untested] " David Howells 2019-08-29 18:31 ` David Howells 2019-08-29 18:31 ` David Howells
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.