linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] random: add fork_event sysctl for polling VM forks
@ 2022-04-19 16:04 Jason A. Donenfeld
  2022-04-19 16:37 ` Jann Horn
  2022-04-21 13:35 ` [PATCH v2] " Jason A. Donenfeld
  0 siblings, 2 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-04-19 16:04 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, Alexander Graf
  Cc: Jason A. Donenfeld, Dominik Brodowski, Greg Kroah-Hartman,
	Theodore Ts'o, Jann Horn, Colm MacCarthaigh

In order to inform userspace of virtual machine forks, this commit adds
a "fork_event" sysctl, which does not return any data, but allows
userspace processes to poll() on it for notification of VM forks.

It avoids exposing the actual vmgenid from the hypervisor to userspace,
in case there is any randomness value in keeping it secret. Rather,
userspace is expected to simply use getrandom() if it wants a fresh
value.

For example, the following snippet can be used to print a message every
time a VM forks, after the RNG has been reseeded:

  struct pollfd fd = { .fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY)  };
  assert(fd.fd >= 0);
  for (;;) {
    assert(poll(&fd, 1, -1) > 0);
    puts("vm fork detected");
  }

Various programs and libraries that utilize cryptographic operations
depending on fresh randomness can invalidate old keys or take other
appropriate actions when receiving that event. While this is racier than
allowing userspace to mmap/vDSO the vmgenid itself, it's an incremental
step forward that's not as heavyweight.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Alexander Graf <graf@amazon.com>
Cc: Jann Horn <jannh@google.com>
Cc: Colm MacCarthaigh <colmmacc@amazon.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 Documentation/admin-guide/sysctl/kernel.rst |  6 ++++--
 drivers/char/random.c                       | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 1144ea3229a3..ddbd603f0be7 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1001,7 +1001,7 @@ This is a directory, with the following entries:
 * ``urandom_min_reseed_secs``: obsolete (used to determine the minimum
   number of seconds between urandom pool reseeding). This file is
   writable for compatibility purposes, but writing to it has no effect
-  on any RNG behavior.
+  on any RNG behavior;
 
 * ``uuid``: a UUID generated every time this is retrieved (this can
   thus be used to generate UUIDs at will);
@@ -1009,8 +1009,10 @@ This is a directory, with the following entries:
 * ``write_wakeup_threshold``: when the entropy count drops below this
   (as a number of bits), processes waiting to write to ``/dev/random``
   are woken up. This file is writable for compatibility purposes, but
-  writing to it has no effect on any RNG behavior.
+  writing to it has no effect on any RNG behavior;
 
+* ``fork_event``: unreadable, but can be poll()'d on for notifications
+  delivered after the RNG reseeds following a virtual machine fork.
 
 randomize_va_space
 ==================
diff --git a/drivers/char/random.c b/drivers/char/random.c
index bf89c6f27a19..63fba6f042f7 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1187,6 +1187,7 @@ EXPORT_SYMBOL_GPL(add_bootloader_randomness);
 
 #if IS_ENABLED(CONFIG_VMGENID)
 static BLOCKING_NOTIFIER_HEAD(vmfork_chain);
+static DEFINE_CTL_TABLE_POLL(sysctl_fork_event_poll);
 
 /*
  * Handle a new unique VM ID, which is unique, not secret, so we
@@ -1201,6 +1202,8 @@ void add_vmfork_randomness(const void *unique_vm_id, size_t size)
 		pr_notice("crng reseeded due to virtual machine fork\n");
 	}
 	blocking_notifier_call_chain(&vmfork_chain, 0, NULL);
+	if (IS_ENABLED(CONFIG_SYSCTL))
+		proc_sys_poll_notify(&sysctl_fork_event_poll);
 }
 #if IS_MODULE(CONFIG_VMGENID)
 EXPORT_SYMBOL_GPL(add_vmfork_randomness);
@@ -1655,6 +1658,8 @@ const struct file_operations urandom_fops = {
  *   It is writable to avoid breaking old userspaces, but writing
  *   to it does not change any behavior of the RNG.
  *
+ * - fork_event - an unreadable file that can be poll()'d on for VM forks.
+ *
  ********************************************************************/
 
 #ifdef CONFIG_SYSCTL
@@ -1708,6 +1713,14 @@ static int proc_do_rointvec(struct ctl_table *table, int write, void *buffer,
 	return write ? 0 : proc_dointvec(table, 0, buffer, lenp, ppos);
 }
 
+#if IS_ENABLED(CONFIG_VMGENID)
+static int proc_do_nodata(struct ctl_table *table, int write, void *buffer,
+			  size_t *lenp, loff_t *ppos)
+{
+	return -ENODATA;
+}
+#endif
+
 static struct ctl_table random_table[] = {
 	{
 		.procname	= "poolsize",
@@ -1748,6 +1761,14 @@ static struct ctl_table random_table[] = {
 		.mode		= 0444,
 		.proc_handler	= proc_do_uuid,
 	},
+#if IS_ENABLED(CONFIG_VMGENID)
+	{
+		.procname	= "fork_event",
+		.mode		= 0444,
+		.poll		= &sysctl_fork_event_poll,
+		.proc_handler	= proc_do_nodata,
+	},
+#endif
 	{ }
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] random: add fork_event sysctl for polling VM forks
  2022-04-19 16:04 [PATCH] random: add fork_event sysctl for polling VM forks Jason A. Donenfeld
@ 2022-04-19 16:37 ` Jann Horn
  2022-04-19 16:42   ` Jason A. Donenfeld
  2022-04-21 13:35 ` [PATCH v2] " Jason A. Donenfeld
  1 sibling, 1 reply; 9+ messages in thread
From: Jann Horn @ 2022-04-19 16:37 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-crypto, Alexander Graf, Dominik Brodowski,
	Greg Kroah-Hartman, Theodore Ts'o, Colm MacCarthaigh

On Tue, Apr 19, 2022 at 6:04 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> In order to inform userspace of virtual machine forks, this commit adds
> a "fork_event" sysctl, which does not return any data, but allows
> userspace processes to poll() on it for notification of VM forks.
>
> It avoids exposing the actual vmgenid from the hypervisor to userspace,
> in case there is any randomness value in keeping it secret. Rather,
> userspace is expected to simply use getrandom() if it wants a fresh
> value.
>
> For example, the following snippet can be used to print a message every
> time a VM forks, after the RNG has been reseeded:
>
>   struct pollfd fd = { .fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY)  };
>   assert(fd.fd >= 0);
>   for (;;) {
>     assert(poll(&fd, 1, -1) > 0);
>     puts("vm fork detected");
>   }

This is a bit of a weird API, because normally .poll is supposed to be
level-triggered rather than edge-triggered... and AFAIK things like
epoll also kinda assume that ->poll() doesn't modify state (but that
only _really_ matters in weird cases). But at the same time, it looks
like the existing proc_sys_poll() already goes against that? So I
don't know what the right thing to do there is...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] random: add fork_event sysctl for polling VM forks
  2022-04-19 16:37 ` Jann Horn
@ 2022-04-19 16:42   ` Jason A. Donenfeld
  2022-04-19 19:44     ` Jann Horn
  0 siblings, 1 reply; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-04-19 16:42 UTC (permalink / raw)
  To: Jann Horn
  Cc: LKML, Linux Crypto Mailing List, Alexander Graf,
	Dominik Brodowski, Greg Kroah-Hartman, Theodore Ts'o,
	Colm MacCarthaigh

Hey Jann,

On Tue, Apr 19, 2022 at 6:38 PM Jann Horn <jannh@google.com> wrote:
> This is a bit of a weird API, because normally .poll is supposed to be
> level-triggered rather than edge-triggered... and AFAIK things like
> epoll also kinda assume that ->poll() doesn't modify state (but that
> only _really_ matters in weird cases). But at the same time, it looks
> like the existing proc_sys_poll() already goes against that? So I
> don't know what the right thing to do there is...

Doesn't the level vs edge distinction apply to POLLIN/POLLOUT events?
In this case, the event generated is actually POLLERR. On one hand,
this is sort of weird. On the other hand, it perhaps makes sense,
since nothing changes respect to its readability/writeability. And it
also happens to be how the sysctl poll() infrastructure was designed;
I didn't need to change anything for this behavior, and it comes as a
result of this rather trivial commit only. Looking at where else it's
used, it appears to be the intended use case for changes to
hostname/domainname. So while it's unusual, it also appears to be the
usual way that sysctl poll() works. So perhaps we're quite lucky here
in that sysctl poll() winds up being the correct interface for what we
want?

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] random: add fork_event sysctl for polling VM forks
  2022-04-19 16:42   ` Jason A. Donenfeld
@ 2022-04-19 19:44     ` Jann Horn
  2022-04-20  0:15       ` Jason A. Donenfeld
  0 siblings, 1 reply; 9+ messages in thread
From: Jann Horn @ 2022-04-19 19:44 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: LKML, Linux Crypto Mailing List, Alexander Graf,
	Dominik Brodowski, Greg Kroah-Hartman, Theodore Ts'o,
	Colm MacCarthaigh

On Tue, Apr 19, 2022 at 6:42 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hey Jann,
>
> On Tue, Apr 19, 2022 at 6:38 PM Jann Horn <jannh@google.com> wrote:
> > This is a bit of a weird API, because normally .poll is supposed to be
> > level-triggered rather than edge-triggered... and AFAIK things like
> > epoll also kinda assume that ->poll() doesn't modify state (but that
> > only _really_ matters in weird cases). But at the same time, it looks
> > like the existing proc_sys_poll() already goes against that? So I
> > don't know what the right thing to do there is...
>
> Doesn't the level vs edge distinction apply to POLLIN/POLLOUT events?

I don't see why it would be limited to that.

> In this case, the event generated is actually POLLERR. On one hand,
> this is sort of weird. On the other hand, it perhaps makes sense,
> since nothing changes respect to its readability/writeability. And it
> also happens to be how the sysctl poll() infrastructure was designed;
> I didn't need to change anything for this behavior, and it comes as a
> result of this rather trivial commit only. Looking at where else it's
> used, it appears to be the intended use case for changes to
> hostname/domainname. So while it's unusual, it also appears to be the
> usual way that sysctl poll() works. So perhaps we're quite lucky here
> in that sysctl poll() winds up being the correct interface for what we
> want?

AFAIK this also means that if you make an epoll watch for
/proc/sys/kernel/random/fork_event, and then call poll() *on the epoll
fd* for some reason, that will probably already consume the event; and
if you then try to actually receive the epoll event via epoll_wait(),
it'll already be gone (because epoll tries to re-poll the "ready"
files to figure out what state those files are at now). Similarly if
you try to create an epoll watch for an FD that already has an event
pending: Installing the watch will call the ->poll handler once,
resetting the file's state, and the following epoll_wait() will call
->poll again and think the event is already gone. See the call paths
to vfs_poll() in fs/eventpoll.c.

Maybe we don't care about such exotic usage, and are willing to accept
the UAPI inconsistency and slight epoll breakage of plumbing
edge-triggered polling through APIs designed for level-triggered
polling. IDK.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] random: add fork_event sysctl for polling VM forks
  2022-04-19 19:44     ` Jann Horn
@ 2022-04-20  0:15       ` Jason A. Donenfeld
  2022-04-20 13:24         ` Jason A. Donenfeld
  0 siblings, 1 reply; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-04-20  0:15 UTC (permalink / raw)
  To: Jann Horn
  Cc: LKML, Linux Crypto Mailing List, Alexander Graf,
	Dominik Brodowski, Greg Kroah-Hartman, Theodore Ts'o,
	Colm MacCarthaigh

Hi Jann,

On Tue, Apr 19, 2022 at 9:45 PM Jann Horn <jannh@google.com> wrote:
> AFAIK this also means that if you make an epoll watch for
> /proc/sys/kernel/random/fork_event, and then call poll() *on the epoll
> fd* for some reason, that will probably already consume the event; and
> if you then try to actually receive the epoll event via epoll_wait(),
> it'll already be gone (because epoll tries to re-poll the "ready"
> files to figure out what state those files are at now). Similarly if
> you try to create an epoll watch for an FD that already has an event
> pending: Installing the watch will call the ->poll handler once,
> resetting the file's state, and the following epoll_wait() will call
> ->poll again and think the event is already gone. See the call paths
> to vfs_poll() in fs/eventpoll.c.
>
> Maybe we don't care about such exotic usage, and are willing to accept
> the UAPI inconsistency and slight epoll breakage of plumbing
> edge-triggered polling through APIs designed for level-triggered
> polling. IDK.

Hmm, I see. The thing is, this is _already_ what's done for
domainname/hostname. It's how the sysctl poll handler was "designed".
So our options here are:

a) Remove this quirky behavior from domainname/hostname and start
over. This would potentially break userspace, but maybe nobody uses
this? No idea, but sounds risky.

b) Apply this commit as-is, because it's using the API as the API was
designed, and call it a day.

c) Apply this commit as-is, because it's using the API as the API was
designed, and then later try to fix up the epoll behavior on this.

Of these, (a) seems like a non-starter. (c) is most appealing, but it
sounds like it might not actually be possible?

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] random: add fork_event sysctl for polling VM forks
  2022-04-20  0:15       ` Jason A. Donenfeld
@ 2022-04-20 13:24         ` Jason A. Donenfeld
  2022-04-20 14:23           ` Jann Horn
  0 siblings, 1 reply; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-04-20 13:24 UTC (permalink / raw)
  To: Jann Horn
  Cc: LKML, Linux Crypto Mailing List, Alexander Graf,
	Dominik Brodowski, Greg Kroah-Hartman, Theodore Ts'o,
	Colm MacCarthaigh

Hey again,

On Wed, Apr 20, 2022 at 02:15:45AM +0200, Jason A. Donenfeld wrote:
> Hi Jann,
> 
> On Tue, Apr 19, 2022 at 9:45 PM Jann Horn <jannh@google.com> wrote:
> > AFAIK this also means that if you make an epoll watch for
> > /proc/sys/kernel/random/fork_event, and then call poll() *on the epoll
> > fd* for some reason, that will probably already consume the event; and
> > if you then try to actually receive the epoll event via epoll_wait(),
> > it'll already be gone (because epoll tries to re-poll the "ready"
> > files to figure out what state those files are at now). Similarly if
> > you try to create an epoll watch for an FD that already has an event
> > pending: Installing the watch will call the ->poll handler once,
> > resetting the file's state, and the following epoll_wait() will call
> > ->poll again and think the event is already gone. See the call paths
> > to vfs_poll() in fs/eventpoll.c.
> >
> > Maybe we don't care about such exotic usage, and are willing to accept
> > the UAPI inconsistency and slight epoll breakage of plumbing
> > edge-triggered polling through APIs designed for level-triggered
> > polling. IDK.
> 
> Hmm, I see. The thing is, this is _already_ what's done for
> domainname/hostname. It's how the sysctl poll handler was "designed".
> So our options here are:
> 
> a) Remove this quirky behavior from domainname/hostname and start
> over. This would potentially break userspace, but maybe nobody uses
> this? No idea, but sounds risky.
> 
> b) Apply this commit as-is, because it's using the API as the API was
> designed, and call it a day.
> 
> c) Apply this commit as-is, because it's using the API as the API was
> designed, and then later try to fix up the epoll behavior on this.
> 
> Of these, (a) seems like a non-starter. (c) is most appealing, but it
> sounds like it might not actually be possible?
> 
> Jason

I actually tried to verify your concern but didn't have success doing
so.

Both of these worked:

        int efd = epoll_create1(0);
        assert(efd >= 0);
        struct epoll_event event = {
                .data.fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY)
        };
        assert(event.data.fd >= 0);
        assert(epoll_ctl(efd, EPOLL_CTL_ADD, event.data.fd, &event) == 0);
        for (;;) {
                assert(epoll_wait(efd, &event, 1, -1) == 1);
                puts("vm fork detected");
        }

And:

        int efd = epoll_create1(0);
        assert(efd >= 0);
        struct epoll_event event = {
                .data.fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY)
        };
        assert(event.data.fd >= 0);
        assert(epoll_ctl(efd, EPOLL_CTL_ADD, event.data.fd, &event) == 0);
        for (;;) {
                assert(poll(&(struct pollfd){ .fd = efd, .events = POLLIN }, 1, -1) == 1);
                puts("vm fork detected");
        }

It also worked if I added EPOLLET to the epoll_event. It did not work if
I removed POLLIN from the pollfd event.

Maybe I'm missing some subtlety. But what exactly is broken? (Either
way, it doesn't change the (a) vs (c) calculus in my previous email.)

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] random: add fork_event sysctl for polling VM forks
  2022-04-20 13:24         ` Jason A. Donenfeld
@ 2022-04-20 14:23           ` Jann Horn
  2022-04-20 16:37             ` Jason A. Donenfeld
  0 siblings, 1 reply; 9+ messages in thread
From: Jann Horn @ 2022-04-20 14:23 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: LKML, Linux Crypto Mailing List, Alexander Graf,
	Dominik Brodowski, Greg Kroah-Hartman, Theodore Ts'o,
	Colm MacCarthaigh

On Wed, Apr 20, 2022 at 3:25 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hey again,
>
> On Wed, Apr 20, 2022 at 02:15:45AM +0200, Jason A. Donenfeld wrote:
> > Hi Jann,
> >
> > On Tue, Apr 19, 2022 at 9:45 PM Jann Horn <jannh@google.com> wrote:
> > > AFAIK this also means that if you make an epoll watch for
> > > /proc/sys/kernel/random/fork_event, and then call poll() *on the epoll
> > > fd* for some reason, that will probably already consume the event; and
> > > if you then try to actually receive the epoll event via epoll_wait(),
> > > it'll already be gone (because epoll tries to re-poll the "ready"
> > > files to figure out what state those files are at now). Similarly if
> > > you try to create an epoll watch for an FD that already has an event
> > > pending: Installing the watch will call the ->poll handler once,
> > > resetting the file's state, and the following epoll_wait() will call
> > > ->poll again and think the event is already gone. See the call paths
> > > to vfs_poll() in fs/eventpoll.c.
> > >
> > > Maybe we don't care about such exotic usage, and are willing to accept
> > > the UAPI inconsistency and slight epoll breakage of plumbing
> > > edge-triggered polling through APIs designed for level-triggered
> > > polling. IDK.
> >
> > Hmm, I see. The thing is, this is _already_ what's done for
> > domainname/hostname. It's how the sysctl poll handler was "designed".
> > So our options here are:
> >
> > a) Remove this quirky behavior from domainname/hostname and start
> > over. This would potentially break userspace, but maybe nobody uses
> > this? No idea, but sounds risky.
> >
> > b) Apply this commit as-is, because it's using the API as the API was
> > designed, and call it a day.
> >
> > c) Apply this commit as-is, because it's using the API as the API was
> > designed, and then later try to fix up the epoll behavior on this.
> >
> > Of these, (a) seems like a non-starter. (c) is most appealing, but it
> > sounds like it might not actually be possible?
> >
> > Jason
>
> I actually tried to verify your concern but didn't have success doing
> so.

My point is that when you run this code:

$ cat edgepoll.c
#include <time.h>
#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <unistd.h>
#include <poll.h>
#include <sys/epoll.h>

#define SYSCHK(x) ({          \
  typeof(x) __res = (x);      \
  if (__res == (typeof(x))-1) \
    err(1, "SYSCHK(" #x ")"); \
  __res;                      \
})

int main(void) {
  int epfd = SYSCHK(epoll_create1(0));
  int hostname_fd = SYSCHK(open("/proc/sys/kernel/hostname", O_RDONLY));
  struct epoll_event event = { .events = EPOLLERR, .data = { .u32 = 1234 } };
  SYSCHK(epoll_ctl(epfd, EPOLL_CTL_ADD, hostname_fd, &event));

  while (1) {
    struct pollfd pollfds[1] = { { .fd = epfd, .events = POLLIN } };
    int poll_res = poll(pollfds, 1, -1);
    if (poll_res == -1) {
      perror("poll() error");
      continue;
    }
    if (poll_res == 0) {
      printf("poll(): no events ready (can't happen, we're using
timeout=-1)\n");
      continue;
    }
    struct epoll_event events[1];
    int epoll_res = epoll_wait(epfd, events, 1, 0);
    if (epoll_res == -1) {
      perror("epoll error");
      continue;
    }
    if (epoll_res == 0) {
      printf("spurious epoll readiness\n");
      continue;
    }
    printf("got epoll fd readiness: events=0x%x, u32=%u\n",
events[0].events, events[0].data.u32);
  }
}
$ gcc -o edgepoll edgepoll.c
$ ./edgepoll

and then change the hostname, you'll just get "spurious epoll
readiness" logged - simply calling poll() on the epoll FD resets the
state of the hostname file that is being polled, so when we then try
to receive the epoll event with epoll_wait(), the event is gone.


And the other case is this:

$ cat edgepoll2.c
#include <time.h>
#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <unistd.h>
#include <poll.h>
#include <sys/epoll.h>

#define SYSCHK(x) ({          \
  typeof(x) __res = (x);      \
  if (__res == (typeof(x))-1) \
    err(1, "SYSCHK(" #x ")"); \
  __res;                      \
})

int main(void) {
  int epfd = SYSCHK(epoll_create1(0));
  int hostname_fd = SYSCHK(open("/proc/sys/kernel/hostname", O_RDONLY));
  printf("opened hostname fd, sleeping\n");
  sleep(10);
  printf("done sleeping\n");
  struct epoll_event event = { .events = EPOLLERR, .data = { .u32 = 1234 } };
  SYSCHK(epoll_ctl(epfd, EPOLL_CTL_ADD, hostname_fd, &event));

  struct epoll_event events[1];
  int epoll_res = SYSCHK(epoll_wait(epfd, events, 1, 0));
  if (epoll_res == 0)
    errx(1, "no epoll events ready");
  printf("got epoll fd readiness: events=0x%x, u32=%u\n",
events[0].events, events[0].data.u32);
}
$ gcc -o edgepoll2 edgepoll2.c
$ ./edgepoll2
opened hostname fd, sleeping
done sleeping
edgepoll2: no epoll events ready
$

If you change the hostname when "opened hostname fd, sleeping" is
printed, it'll still say "edgepoll2: no epoll events ready", because
the EPOLL_CTL_ADD basically consumed the event.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] random: add fork_event sysctl for polling VM forks
  2022-04-20 14:23           ` Jann Horn
@ 2022-04-20 16:37             ` Jason A. Donenfeld
  0 siblings, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-04-20 16:37 UTC (permalink / raw)
  To: Jann Horn
  Cc: LKML, Linux Crypto Mailing List, Alexander Graf,
	Dominik Brodowski, Greg Kroah-Hartman, Theodore Ts'o,
	Colm MacCarthaigh

Hey Jann,

Ahh, gotcha, that makes sense. Either way, sounds like something to
fix in the sysctl proc API (option c) if possible...

Jason

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] random: add fork_event sysctl for polling VM forks
  2022-04-19 16:04 [PATCH] random: add fork_event sysctl for polling VM forks Jason A. Donenfeld
  2022-04-19 16:37 ` Jann Horn
@ 2022-04-21 13:35 ` Jason A. Donenfeld
  1 sibling, 0 replies; 9+ messages in thread
From: Jason A. Donenfeld @ 2022-04-21 13:35 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, Alexander Graf, Colm MacCarthaigh
  Cc: Jason A. Donenfeld, Dominik Brodowski, Greg Kroah-Hartman,
	Theodore Ts'o, Jann Horn

In order to inform userspace of virtual machine forks, this commit adds
a "fork_event" sysctl, which does not return any data, but allows
userspace processes to poll() on it for notification of VM forks.

It avoids exposing the actual vmgenid from the hypervisor to userspace,
in case there is any randomness value in keeping it secret. Rather,
userspace is expected to simply use getrandom() if it wants a fresh
value.

For example, the following snippet can be used to print a message every
time a VM forks, after the RNG has been reseeded:

  struct pollfd fd = { .fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY)  };
  assert(fd.fd >= 0);
  for (;;) {
    assert(poll(&fd, 1, -1) > 0);
    puts("vm fork detected");
  }

Various programs and libraries that utilize cryptographic operations
depending on fresh randomness can invalidate old keys or take other
appropriate actions when receiving that event. While this is racier than
allowing userspace to mmap/vDSO the vmgenid itself, it's an incremental
step forward that's not as heavyweight.

Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Alexander Graf <graf@amazon.com>
Cc: Jann Horn <jannh@google.com>
Cc: Colm MacCarthaigh <colmmacc@amazon.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Alex/Colm - Could you guys let me know if this satisfies what you'd need
it for at Amazon?

Changes v1->v2:
- Some small fixes with the CONFIG_SYSCTL ifdef.

 Documentation/admin-guide/sysctl/kernel.rst |  6 ++++--
 drivers/char/random.c                       | 24 +++++++++++++++++++++
 2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 1144ea3229a3..ddbd603f0be7 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1001,7 +1001,7 @@ This is a directory, with the following entries:
 * ``urandom_min_reseed_secs``: obsolete (used to determine the minimum
   number of seconds between urandom pool reseeding). This file is
   writable for compatibility purposes, but writing to it has no effect
-  on any RNG behavior.
+  on any RNG behavior;
 
 * ``uuid``: a UUID generated every time this is retrieved (this can
   thus be used to generate UUIDs at will);
@@ -1009,8 +1009,10 @@ This is a directory, with the following entries:
 * ``write_wakeup_threshold``: when the entropy count drops below this
   (as a number of bits), processes waiting to write to ``/dev/random``
   are woken up. This file is writable for compatibility purposes, but
-  writing to it has no effect on any RNG behavior.
+  writing to it has no effect on any RNG behavior;
 
+* ``fork_event``: unreadable, but can be poll()'d on for notifications
+  delivered after the RNG reseeds following a virtual machine fork.
 
 randomize_va_space
 ==================
diff --git a/drivers/char/random.c b/drivers/char/random.c
index bf89c6f27a19..36196e463b90 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1187,6 +1187,9 @@ EXPORT_SYMBOL_GPL(add_bootloader_randomness);
 
 #if IS_ENABLED(CONFIG_VMGENID)
 static BLOCKING_NOTIFIER_HEAD(vmfork_chain);
+#ifdef CONFIG_SYSCTL
+static DEFINE_CTL_TABLE_POLL(sysctl_fork_event_poll);
+#endif
 
 /*
  * Handle a new unique VM ID, which is unique, not secret, so we
@@ -1201,6 +1204,9 @@ void add_vmfork_randomness(const void *unique_vm_id, size_t size)
 		pr_notice("crng reseeded due to virtual machine fork\n");
 	}
 	blocking_notifier_call_chain(&vmfork_chain, 0, NULL);
+#ifdef CONFIG_SYSCTL
+	proc_sys_poll_notify(&sysctl_fork_event_poll);
+#endif
 }
 #if IS_MODULE(CONFIG_VMGENID)
 EXPORT_SYMBOL_GPL(add_vmfork_randomness);
@@ -1655,6 +1661,8 @@ const struct file_operations urandom_fops = {
  *   It is writable to avoid breaking old userspaces, but writing
  *   to it does not change any behavior of the RNG.
  *
+ * - fork_event - an unreadable file that can be poll()'d on for VM forks.
+ *
  ********************************************************************/
 
 #ifdef CONFIG_SYSCTL
@@ -1708,6 +1716,14 @@ static int proc_do_rointvec(struct ctl_table *table, int write, void *buffer,
 	return write ? 0 : proc_dointvec(table, 0, buffer, lenp, ppos);
 }
 
+#if IS_ENABLED(CONFIG_VMGENID)
+static int proc_do_nodata(struct ctl_table *table, int write, void *buffer,
+			  size_t *lenp, loff_t *ppos)
+{
+	return -ENODATA;
+}
+#endif
+
 static struct ctl_table random_table[] = {
 	{
 		.procname	= "poolsize",
@@ -1748,6 +1764,14 @@ static struct ctl_table random_table[] = {
 		.mode		= 0444,
 		.proc_handler	= proc_do_uuid,
 	},
+#if IS_ENABLED(CONFIG_VMGENID)
+	{
+		.procname	= "fork_event",
+		.mode		= 0444,
+		.poll		= &sysctl_fork_event_poll,
+		.proc_handler	= proc_do_nodata,
+	},
+#endif
 	{ }
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-21 13:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-19 16:04 [PATCH] random: add fork_event sysctl for polling VM forks Jason A. Donenfeld
2022-04-19 16:37 ` Jann Horn
2022-04-19 16:42   ` Jason A. Donenfeld
2022-04-19 19:44     ` Jann Horn
2022-04-20  0:15       ` Jason A. Donenfeld
2022-04-20 13:24         ` Jason A. Donenfeld
2022-04-20 14:23           ` Jann Horn
2022-04-20 16:37             ` Jason A. Donenfeld
2022-04-21 13:35 ` [PATCH v2] " Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).