All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] Removing RAMBlocks during migration
@ 2019-12-09  7:41 Yury Kotov
  2019-12-09  7:41 ` [RFC PATCH 1/1] migration: Remove vmstate_unregister_ram Yury Kotov
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Yury Kotov @ 2019-12-09  7:41 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, yc-core, Juan Quintela, Michael S. Tsirkin,
	Dr. David Alan Gilbert, Max Reitz, Igor Mammedov,
	Philippe Mathieu-Daudé

Hi,

I found that it's possible to remove a RAMBlock during migration.
E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
And I want to clarify whether RAMBlock removing (or even adding) during
migration is valid operation or it's a bug.

Currently, it may cause some race conditions with migration thread and
migration may fail because of them. For instance, vmstate_unregister_ram
function which is called during PCIe device removing does these:
- Memset idstr -> target may receive unknown/zeroed idstr -> migration fail
- Set RAMBlock flags as non-migratable -> migration fail

RAMBlock removing itself seems safe for migration thread because of RCU.
But it seems to me there are other possible race conditions (didn't test it):
- qemu_put_buffer_async -> saves pointer to RAMBlock's memory
   -> block will be freed out of RCU (between ram save iterations)
   -> qemu_fflush -> access to freed memory.

So, I have the following questions:
1. Is RAMBlock removing/adding OK during migration?
2. If yes then what should we do with vmstate_unregister_ram?
   - Just remove vmstate_unregister_ram (my RFC patch)
   - Refcount RAMBlock's migratable/non-migratable state
   - Something else?
3. If it mustn't be possible, so may be
   assert(migration_is_idle()) in qemu_ram_free?

P.S.
I'm working on a fix of below problem and trying to choose better way:
allow device removing and fix all problem like this or fix a particular device.

--------
How to reproduce device removing during migration:

1. Source QEMU command line (target is similar)
  $ x86_64-softmmu/qemu-system-x86_64 \
    -nodefaults -no-user-config -m 1024 -M q35 \
    -qmp unix:./src.sock,server,nowait \
    -drive file=./image,format=raw,if=virtio \
    -device ioh3420,id=pcie.1 \
    -device virtio-net,bus=pcie.1
2. Start migration with slow speed (to simplify reproducing)
3. Power off a device on the hotplug pcie.1 bus:
  $ echo 0 > /sys/bus/pci/slots/0/power
4. Increase migration speed and wait until fail

Most likely you will get something like this:
  qemu-system-x86_64: get_pci_config_device: Bad config data:
          i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
  qemu-system-x86_64: Failed to load PCIDevice:config
  qemu-system-x86_64: Failed to load
          ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj
  qemu-system-x86_64: error while loading state for instance 0x0 of device
          '0000:00:03.0/ioh-3240-express-root-port'
  qemu-system-x86_64: load of migration failed: Invalid argument

This error is just an illustration of the removing device possibility,
but not actually an illustration of the race conditions for removing RAMBlock.

Regards,
Yury

Yury Kotov (1):
  migration: Remove vmstate_unregister_ram

 hw/block/pflash_cfi01.c     | 1 -
 hw/block/pflash_cfi02.c     | 1 -
 hw/mem/pc-dimm.c            | 5 -----
 hw/misc/ivshmem.c           | 2 --
 hw/pci/pci.c                | 1 -
 include/migration/vmstate.h | 1 -
 migration/savevm.c          | 6 ------
 7 files changed, 17 deletions(-)

-- 
2.24.0



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-01-13 14:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-09  7:41 [RFC PATCH 0/1] Removing RAMBlocks during migration Yury Kotov
2019-12-09  7:41 ` [RFC PATCH 1/1] migration: Remove vmstate_unregister_ram Yury Kotov
2019-12-11 11:16 ` [RFC PATCH 0/1] Removing RAMBlocks during migration Dr. David Alan Gilbert
2019-12-23  8:51   ` Yury Kotov
2020-01-03 11:44     ` Dr. David Alan Gilbert
2020-01-07 20:08       ` Michael S. Tsirkin
2020-01-08 10:24         ` Dr. David Alan Gilbert
2020-01-13 14:18         ` Yury Kotov
2020-01-07 20:02 ` Michael S. Tsirkin
2020-01-08 13:40   ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.