All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alexander Graf <graf@amazon.com>
Cc: Adrian Catangiu <acatan@amazon.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	qemu-devel@nongnu.org, kvm@vger.kernel.org,
	linux-s390@vger.kernel.org, gregkh@linuxfoundation.org,
	rdunlap@infradead.org, arnd@arndb.de, ebiederm@xmission.com,
	rppt@kernel.org, 0x7f454c46@gmail.com, borntraeger@de.ibm.com,
	Jason@zx2c4.com, jannh@google.com, w@1wt.eu, colmmacc@amazon.com,
	luto@kernel.org, tytso@mit.edu, ebiggers@kernel.org,
	dwmw@amazon.co.uk, bonzini@gnu.org, sblbir@amazon.com,
	raduweis@amazon.com, corbet@lwn.net, mhocko@kernel.org,
	rafael@kernel.org, pavel@ucw.cz, mpe@ellerman.id.au,
	areber@redhat.com, ovzxemul@gmail.com, avagin@gmail.com,
	ptikhomirov@virtuozzo.com, gil@azul.com, asmehra@redhat.com,
	dgunigun@redhat.com, vijaysun@ca.ibm.com, oridgar@gmail.com,
	ghammer@redhat.com
Subject: Re: [PATCH v7 1/2] drivers/misc: sysgenid: add system generation id driver
Date: Wed, 24 Feb 2021 17:41:59 -0500	[thread overview]
Message-ID: <20210224173205-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <d63146a9-a3f8-14ea-2b16-cb5b3fe7aecf@amazon.com>

On Wed, Feb 24, 2021 at 02:45:03PM +0100, Alexander Graf wrote:
> > Above should try harder to explan what are the things that need to be
> > scrubbed and why. For example, I personally don't really know what is
> > the OpenSSL session token example and what makes it vulnerable. I guess
> > snapshots can attack each other?
> > 
> > 
> > 
> > 
> > Here's a simple example of a workflow that submits transactions
> > to a database and wants to avoid duplicate transactions.
> > This does not require overseer magic. It does however require
> > a correct genid from hypervisor, so no mmap tricks work.
> > 
> > 
> > 
> >          int genid, oldgenid;
> >          read(&genid);
> > start:
> >          oldgenid = genid;
> >          transid = submit transaction
> >          read(&genid);
> >          if (genid != oldgenid) {
> >                          revert transaction (transid);
> >                          goto start:
> >          }
> 
> I'm not sure I fully follow. For starters, if this is a VM local database, I
> don't think you'd care about the genid. If it's a remote database, your
> connection would get dropped already at the point when you clone/resume,
> because TCP and your connection state machine will get really confused when
> you suddenly have a different IP address or two consumers of the same stream
> :).
>
> But for the sake of the argument, let's assume you can have a connectionless
> database connection that maintains its own connection uniqueness logic.

Right. E.g. not uncommon with REST APIs. They survive disconnect easily
and use cookies or such.

> That
> database connector would need to understand how to abort the connection (and
> thus the transaction!) when the generation changes.

the point is that instead of all that you discover transaction as
a duplicate and revert it.


> And that's logic you
> would do with the read/write/notify mechanism. So your main loop would check
> for reads on the genid fd and after sending a connection termination, notify
> the overlord that it's safe to use the VM now.
> 
> The OpenSSL case (with mmap) is for libraries that are stateless and can not
> guarantee that they receive a genid notification event timely.
> 
> Since you asked, this is mainly important for the PRNG. Imagine an https
> server. You create a snapshot. You resume from that snapshot. OpenSSL is
> fully initialized with a user space PRNG randomness pool that it considers
> safe to consume. However, that means your first connection after resume will
> be 100% predictable randomness wise.

I wonder whether something similar is possible here. I.e. use the secret
to encrypt stuff but check the gen ID before actually sending data.
If it changed re-encrypt. Hmm?

> 
> The mmap mechanism allows the PRNG to reseed after a genid change. Because
> we don't have an event mechanism for this code path, that can happen minutes
> after the resume. But that's ok, we "just" have to ensure that nobody is
> consuming secret data at the point of the snapshot.


Something I am still not clear on is whether it's really important to
skip the system call here. If not I think it's prudent to just stick
to read for now, I think there's a slightly lower chance that
it will get misused. mmap which gives you a laggy gen id value
really seems like it would be hard to use correctly.


> > 
> > 
> > 
> > 
> > 
> > 
> > > +Simplifyng assumption - safety prerequisite
> > > +-------------------------------------------
> > > +
> > > +**Control the snapshot flow**, disallow snapshots coming at arbitrary
> > > +moments in the workload lifetime.
> > > +
> > > +Use a system-level overseer entity that quiesces the system before
> > > +snapshot, and post-snapshot-resume oversees that software components
> > > +have readjusted to new environment, to the new generation. Only after,
> > > +will the overseer un-quiesce the system and allow active workloads.
> > > +
> > > +Software components can choose whether they want to be tracked and
> > > +waited on by the overseer by using the ``SYSGENID_SET_WATCHER_TRACKING``
> > > +IOCTL.
> > > +
> > > +The sysgenid framework standardizes the API for system software to
> > > +find out about needing to readjust and at the same time provides a
> > > +mechanism for the overseer entity to wait for everyone to be done, the
> > > +system to have readjusted, so it can un-quiesce.
> > > +
> > > +Example snapshot-safe workflow
> > > +------------------------------
> > > +
> > > +1) Before taking a snapshot, quiesce the VM/container/system. Exactly
> > > +   how this is achieved is very workload-specific, but the general
> > > +   description is to get all software to an expected state where their
> > > +   event loops dry up and they are effectively quiesced.
> > 
> > If you have ability to do this by communicating with
> > all processes e.g. through a unix domain socket,
> > why do you need the rest of the stuff in the kernel?
> > Quescing is a harder problem than waking up.
> 
> That depends. Think of a typical VM workload. Let's take the web server
> example again. You can preboot the full VM and snapshot it as is. As long as
> you don't allow any incoming connections, you can guarantee that the system
> is "quiesced" well enough for the snapshot.

Well you can use a firewall or such to block incoming packets,
but I am not at all sure that means e.g. all socket buffers
are empty.


> This is really what this bullet point is about. The point is that you're not
> consuming randomness you can't reseed asynchronously (see the above OpenSSL
> PRNG example).
> 
> 
> Alex
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 


WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alexander Graf <graf@amazon.com>
Cc: Jason@zx2c4.com, areber@redhat.com, kvm@vger.kernel.org,
	linux-doc@vger.kernel.org, ghammer@redhat.com,
	vijaysun@ca.ibm.com, 0x7f454c46@gmail.com, qemu-devel@nongnu.org,
	mhocko@kernel.org, dgunigun@redhat.com, avagin@gmail.com,
	pavel@ucw.cz, ptikhomirov@virtuozzo.com,
	linux-s390@vger.kernel.org, corbet@lwn.net, mpe@ellerman.id.au,
	rafael@kernel.org, ebiggers@kernel.org, borntraeger@de.ibm.com,
	sblbir@amazon.com, bonzini@gnu.org, arnd@arndb.de,
	jannh@google.com, raduweis@amazon.com, asmehra@redhat.com,
	Adrian Catangiu <acatan@amazon.com>,
	rppt@kernel.org, luto@kernel.org, gil@azul.com,
	oridgar@gmail.com, colmmacc@amazon.com, tytso@mit.edu,
	gregkh@linuxfoundation.org, rdunlap@infradead.org,
	linux-kernel@vger.kernel.org, ebiederm@xmission.com,
	ovzxemul@gmail.com, w@1wt.eu, dwmw@amazon.co.uk
Subject: Re: [PATCH v7 1/2] drivers/misc: sysgenid: add system generation id driver
Date: Wed, 24 Feb 2021 17:41:59 -0500	[thread overview]
Message-ID: <20210224173205-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <d63146a9-a3f8-14ea-2b16-cb5b3fe7aecf@amazon.com>

On Wed, Feb 24, 2021 at 02:45:03PM +0100, Alexander Graf wrote:
> > Above should try harder to explan what are the things that need to be
> > scrubbed and why. For example, I personally don't really know what is
> > the OpenSSL session token example and what makes it vulnerable. I guess
> > snapshots can attack each other?
> > 
> > 
> > 
> > 
> > Here's a simple example of a workflow that submits transactions
> > to a database and wants to avoid duplicate transactions.
> > This does not require overseer magic. It does however require
> > a correct genid from hypervisor, so no mmap tricks work.
> > 
> > 
> > 
> >          int genid, oldgenid;
> >          read(&genid);
> > start:
> >          oldgenid = genid;
> >          transid = submit transaction
> >          read(&genid);
> >          if (genid != oldgenid) {
> >                          revert transaction (transid);
> >                          goto start:
> >          }
> 
> I'm not sure I fully follow. For starters, if this is a VM local database, I
> don't think you'd care about the genid. If it's a remote database, your
> connection would get dropped already at the point when you clone/resume,
> because TCP and your connection state machine will get really confused when
> you suddenly have a different IP address or two consumers of the same stream
> :).
>
> But for the sake of the argument, let's assume you can have a connectionless
> database connection that maintains its own connection uniqueness logic.

Right. E.g. not uncommon with REST APIs. They survive disconnect easily
and use cookies or such.

> That
> database connector would need to understand how to abort the connection (and
> thus the transaction!) when the generation changes.

the point is that instead of all that you discover transaction as
a duplicate and revert it.


> And that's logic you
> would do with the read/write/notify mechanism. So your main loop would check
> for reads on the genid fd and after sending a connection termination, notify
> the overlord that it's safe to use the VM now.
> 
> The OpenSSL case (with mmap) is for libraries that are stateless and can not
> guarantee that they receive a genid notification event timely.
> 
> Since you asked, this is mainly important for the PRNG. Imagine an https
> server. You create a snapshot. You resume from that snapshot. OpenSSL is
> fully initialized with a user space PRNG randomness pool that it considers
> safe to consume. However, that means your first connection after resume will
> be 100% predictable randomness wise.

I wonder whether something similar is possible here. I.e. use the secret
to encrypt stuff but check the gen ID before actually sending data.
If it changed re-encrypt. Hmm?

> 
> The mmap mechanism allows the PRNG to reseed after a genid change. Because
> we don't have an event mechanism for this code path, that can happen minutes
> after the resume. But that's ok, we "just" have to ensure that nobody is
> consuming secret data at the point of the snapshot.


Something I am still not clear on is whether it's really important to
skip the system call here. If not I think it's prudent to just stick
to read for now, I think there's a slightly lower chance that
it will get misused. mmap which gives you a laggy gen id value
really seems like it would be hard to use correctly.


> > 
> > 
> > 
> > 
> > 
> > 
> > > +Simplifyng assumption - safety prerequisite
> > > +-------------------------------------------
> > > +
> > > +**Control the snapshot flow**, disallow snapshots coming at arbitrary
> > > +moments in the workload lifetime.
> > > +
> > > +Use a system-level overseer entity that quiesces the system before
> > > +snapshot, and post-snapshot-resume oversees that software components
> > > +have readjusted to new environment, to the new generation. Only after,
> > > +will the overseer un-quiesce the system and allow active workloads.
> > > +
> > > +Software components can choose whether they want to be tracked and
> > > +waited on by the overseer by using the ``SYSGENID_SET_WATCHER_TRACKING``
> > > +IOCTL.
> > > +
> > > +The sysgenid framework standardizes the API for system software to
> > > +find out about needing to readjust and at the same time provides a
> > > +mechanism for the overseer entity to wait for everyone to be done, the
> > > +system to have readjusted, so it can un-quiesce.
> > > +
> > > +Example snapshot-safe workflow
> > > +------------------------------
> > > +
> > > +1) Before taking a snapshot, quiesce the VM/container/system. Exactly
> > > +   how this is achieved is very workload-specific, but the general
> > > +   description is to get all software to an expected state where their
> > > +   event loops dry up and they are effectively quiesced.
> > 
> > If you have ability to do this by communicating with
> > all processes e.g. through a unix domain socket,
> > why do you need the rest of the stuff in the kernel?
> > Quescing is a harder problem than waking up.
> 
> That depends. Think of a typical VM workload. Let's take the web server
> example again. You can preboot the full VM and snapshot it as is. As long as
> you don't allow any incoming connections, you can guarantee that the system
> is "quiesced" well enough for the snapshot.

Well you can use a firewall or such to block incoming packets,
but I am not at all sure that means e.g. all socket buffers
are empty.


> This is really what this bullet point is about. The point is that you're not
> consuming randomness you can't reseed asynchronously (see the above OpenSSL
> PRNG example).
> 
> 
> Alex
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 



  reply	other threads:[~2021-02-24 22:44 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24  8:47 [PATCH v7 0/2] System Generation ID driver and VMGENID backend Adrian Catangiu
2021-02-24  8:47 ` Adrian Catangiu
2021-02-24  8:47 ` [PATCH v7 1/2] drivers/misc: sysgenid: add system generation id driver Adrian Catangiu
2021-02-24  8:47   ` Adrian Catangiu
2021-02-24  9:19   ` Michael S. Tsirkin
2021-02-24  9:19     ` Michael S. Tsirkin
2021-02-24 13:45     ` Alexander Graf
2021-02-24 13:45       ` Alexander Graf
2021-02-24 22:41       ` Michael S. Tsirkin [this message]
2021-02-24 22:41         ` Michael S. Tsirkin
2021-02-24 23:22         ` Alexander Graf
2021-02-24  8:47 ` [PATCH v7 2/2] drivers/virt: vmgenid: add vm " Adrian Catangiu
2021-02-24  8:47   ` Adrian Catangiu
2022-02-22 21:24   ` Jason A. Donenfeld
2022-02-22 21:24     ` Jason A. Donenfeld
2022-02-22 22:17     ` Jason A. Donenfeld
2022-02-22 22:17       ` Jason A. Donenfeld
2022-02-23 13:21       ` Jason A. Donenfeld
2022-02-23 13:21         ` Jason A. Donenfeld
2021-02-24  9:05 ` [PATCH v7 0/2] System Generation ID driver and VMGENID backend Michael S. Tsirkin
2021-02-24  9:05   ` Michael S. Tsirkin
2021-03-04 20:08   ` Catangiu, Adrian Costin
2021-02-24 23:00 [PATCH v7 1/2] drivers/misc: sysgenid: add system generation id driver MacCarthaigh, Colm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210224173205-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=0x7f454c46@gmail.com \
    --cc=Jason@zx2c4.com \
    --cc=acatan@amazon.com \
    --cc=areber@redhat.com \
    --cc=arnd@arndb.de \
    --cc=asmehra@redhat.com \
    --cc=avagin@gmail.com \
    --cc=bonzini@gnu.org \
    --cc=borntraeger@de.ibm.com \
    --cc=colmmacc@amazon.com \
    --cc=corbet@lwn.net \
    --cc=dgunigun@redhat.com \
    --cc=dwmw@amazon.co.uk \
    --cc=ebiederm@xmission.com \
    --cc=ebiggers@kernel.org \
    --cc=ghammer@redhat.com \
    --cc=gil@azul.com \
    --cc=graf@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=oridgar@gmail.com \
    --cc=ovzxemul@gmail.com \
    --cc=pavel@ucw.cz \
    --cc=ptikhomirov@virtuozzo.com \
    --cc=qemu-devel@nongnu.org \
    --cc=raduweis@amazon.com \
    --cc=rafael@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=rppt@kernel.org \
    --cc=sblbir@amazon.com \
    --cc=tytso@mit.edu \
    --cc=vijaysun@ca.ibm.com \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.