[Xen-devel] Design session report: Live-Updating Xen

* [Xen-devel] Design session report: Live-Updating Xen
@ 2019-07-15 18:57 Foerster, Leonard
  2019-07-15 19:31 ` Sarah Newman
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Foerster, Leonard @ 2019-07-15 18:57 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1.1: Type: text/plain, Size: 6514 bytes --]

Here is the summary/notes from the Xen Live-Update Design session last week.
I tried to tie together the different topics we talked about into some sections.

https://cryptpad.fr/pad/#/2/pad/edit/fCwXg1GmSXXG8bc4ridHAsnR/

--
Leonard

LIVE UPDATING XEN - DESING SESSION

Brief project overview:
	-> We want to build Xen Live-update
	-> early prototyping phase
	IDEA: change running hypervisor to new one without guest disruptions
	-> Reasons:
		* Security - we might need an updated versions for vulnerability mitigation
		* Development cycle acceleration - fast switch to hypervisor during development
		* Maintainability - reduce version diversity in the fleet
	-> We are currently eyeing a combination of guest transparent live migration
		and kexec into a new xen build
	-> For more details: https://xensummit19.sched.com/event/PFVQ/live-updating-xen-amit-shah-david-woodhouse-amazon

Terminology:
	Running Xen -> The xen running on the host before update (Source)
	Target Xen -> The xen we are updating *to*

Design discussions:

Live-update ties into multiple other projects currently done in the Xen-project:

	* Secret free Xen: reduce the footprint of guest relevant data in Xen
		-> less state we might have to handle in the live update case
	* dom0less: bootstrap domains without the involvement of dom0
		-> this might come in handy to at least setup and continue dom0 on target xen
		-> If we have this this might also enable us to de-serialize the state for
			other guest-domains in xen and not have to wait for dom0 to do this

We want to just keep domain and hardware state
	-> Xen is supposedly completely to be exchanged
	-> We have to keep around the IOMMU page tables and do not touch them
		-> this might also come in handy for some newer UEFI boot related issues?
		-> We might have to go and re-inject certain interrupts
	-> do we need to dis-aggregate xenheap and domheap here?
		-> We are currently trying to avoid this

A key cornerstone for Live-update is guest transparent live migration
	-> This means we are using a well defined ABI for saving/restoring domain state
		-> We do only rely on domain state and no internal xen state
	-> The idea is to migrate the guest not from one machine to another (in space)
		but on the same machine from one hypervisor to another (in time)
	-> In addition we want to keep as much as possible in memory unchanged and feed
		this back to the target domain in order to save time
	-> This means we will need additional info on those memory areas and have to
		be super careful not to stomp over them while starting the target xen
	-> for live migration: domid is a problem in this case
		-> randomize and pray does not work on smaller fleets
		-> this is not a problem for live-update
		-> BUT: as a community we shoudl make this restriction go away

Exchanging the Hypervisor using kexec
	-> We have patches on upstream kexec-tools merged that enable multiboot2 for Xen
	-> We can now load the target xen binary to the crashdump region to not stomp
		over any valuable date we might need later
	-> But using the crashdump region for this has drawbacks when it comes to debugging
		and we might want to think about this later
		-> What happens when live-update goes wrong?
		-> Option: Increase Crashdump region size and partition it or have a separate
			reserved live-update region to load the target xen into 
		-> Separate region or partitioned region is not a priority for V1 but should
			be on the road map for future versions

Who serializes and deserializes domain state?
	-> dom0: This should work fine, but who does this for dom0 itself?
	-> Xen: This will need some more work, but might covered mostly by the dom0less effort on the arm side
		-> this will need some work for x86, but Stefano does not consider this a lot of work
	-> This would mean: serialize domain state into multiboot module and set domains
		up after kexecing xen in the dom0less manner
		-> make multiboot module general enough so we can tag it as boot/resume/create/etc.
			-> this will also enable us to do per-guest feature enablement
			-> finer granular than specifying on cmdline
			-> cmdline stuff is mostly broken, needs to be fixed for nested either way
			-> domain create flags is a mess

Live update instead of crashdump?
	-> Can we use such capabilities to recover from a crash be "restarting" xen on a crash?
		-> live updating into (the same) xen on crash
	-> crashing is a good mechanism because it happens if something is really broken and
		most likely not recoverable
	-> Live update should be a conscious process and not something you do as reaction to a crash
		-> something is really broken if we crash
		-> we should not proactively restart xen on crash
			-> we might run into crash loops
	-> maybe this can be done in the future, but it is not changing anything for the design
		-> if anybody wants to wire this up once live update is there, that should not be too hard
		-> then you want to think about: scattering the domains to multiple other hosts to not keep
			them on broken machines

We should use this opportunity to clean up certain parts of the code base:
	-> interface for domain information is a mess
		-> HVM and PV have some shared data but completely different ways of accessing it

Volume of patches:
	-> Live update: still developing, we do not know yet
	-> guest transparent live migration:
		-> We have roughly 100 patches over time
		-> we believe most of this has just to be cleaned up/squashed and
			will land us at a reasonable much lower number
		-> this also needs 2-3 dom0 kernel patches

Summary of action items:
	-> coordinate with dom0less effort on what we can use and contribute there
	-> fix the domid clash problem
	-> Decision on usage of crash kernel area
	-> fix live migration patch set to include yet unsupported backends
		-> clean up the patch set
		-> upstream it

Longer term vision:

* Have a tiny hypervisor between Guest and Xen that handles the common cases
	-> this enables (almost) zero downtime for the guest
	-> the tiny hypervisor will maintain the guest while the underlying xen is kexecing into new build

* Somebody someday will want to get rid of the long tail of old xen versions in a fleet
	-> live patch old running versions with live update capability?
	-> crashdumping into a new hypervisor?
		-> "crazy idea" but this will likely come up at some point

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread