* [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
@ 2023-02-25 16:31 Peter Xu
2023-02-25 16:31 ` [PATCH RFC 1/4] memory: Make memory_listeners RCU-safe for real Peter Xu
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Peter Xu @ 2023-02-25 16:31 UTC (permalink / raw)
To: qemu-devel
Cc: Maxim Levitsky, Stefan Hajnoczi, Juan Quintela, Paolo Bonzini,
David Hildenbrand, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé,
peterx
[not for merging, but for discussion; this is something I found when
looking at another issue on Chuang's optimization for migration downtime]
Summary: we tried to access memory_listeners, address_spaces, etc. in RCU
way. However we didn't implement them with RCU-safety. This patchset is
trying to do that; at least making it closer.
NOTE! It's doing it wrongly for now, so please feel free to see this as a
thread to start discussing this problem, as in subject.
The core problem here is how to make sure memory listeners will be freed in
RCU ways, per when unlinking them from the global memory_listeners list.
The current patchset (in patch 1) did it with drain_call_rcu(), but of
course it's wrong, because of at least two things:
(1) drain_call_rcu() will release BQL; currently there's no way to me to
guarantee that releasing BQL is safe here.
(2) memory_listener_unregister() can be called within a RCU read lock
itself (we're so happy to take rcu read lock in many places but we
don't think much on how long it'll be taken; at least not as strict
as the kernel variance, so we're just less care about that fact yet).
It means, drain_call_rcu() should deadlock there waiting for itself.
For an example, see Appendix A.
Side question to Stefan / Maxim: why do we need drain_call_rcu() and what's
its difference from synchronize_rcu() in API level besides releasing and
retaking BQL when taken?
I have a few solutions in mind, none of them look pretty, though. Before
going further, I think I should raise this to the list and discuss. The
good thing is I don't think the fix is urgently needed, because hitting the
bug should be very rare in real life (I do think we have yet another bug to
sometimes not taking RCU lock for memory_region_clear_dirty_bitmap too; see
patch 3, which I think is only used during virtio-mem / virtio-balloon
migrating). However I think it should still be important to discuss this
or it'll be even harder to maintain in the future with more things piled
up.
Solution A
==========
We can logically release everything that contains a MemoryListener with
g_free_rcu() rather than g_free(). It can be actually quite
straightforward for some cases (e.g. VFIOContainer, where we can simply do
s/g_free/g_free_rcu/ with the two call sites), but not for some others
(vhost_dev, where there can be tons of other structures wrapping vhost_dev
and it's sometimes not straightforward when the object got freed).
Solution B
==========
Can we omit taking RCU read lock if we already hold BQL? Logically it's
safe for most of the cases we're using RCU with BQL modifying the protected
strucuture, but it may have issue when we extend RCU to outside-BQL usages,
IOW, we may need to have a BQL-flavoured rcu_read_lock_bql() that we use to
reference things that will be protected by BQL, then replace most of the
RCU read locks in QEMU to use that, then I think maybe we can safely not
taking RCU read lock when BQL is taken, but I am really not that sure yet.
Thoughts? Is there any better way to do so? Or am I perhaps over worried?
Thanks,
Appendix A
==========
Sample of calling memory_listener_unregister() with both BQL & RCU held
(RCU held in address_space_write).
3 0x0000562b8da6bd60 in memory_listener_unregister (listener=0x562b8feeae48) at ../softmmu/memory.c:3065
4 0x0000562b8da1ad47 in virtio_device_unrealize (dev=0x562b8feead60) at ../hw/virtio/virtio.c:3631
5 0x0000562b8db1e2f1 in device_set_realized (obj=0x562b8feead60, value=false, errp=0x562b8e617138 <error_abort>) at ../hw/core/qdev.c:599
6 0x0000562b8db27e45 in property_set_bool (obj=0x562b8feead60, v=0x562b90715670, name=0x562b8dec2b19 "realized", opaque=0x562b8fc46070, errp=0x562b8e617138 <error_abort>)
at ../qom/object.c:2285
7 0x0000562b8db25e16 in object_property_set (obj=0x562b8feead60, name=0x562b8dec2b19 "realized", v=0x562b90715670, errp=0x562b8e617138 <error_abort>) at ../qom/object.c:1420
8 0x0000562b8db2a2ad in object_property_set_qobject (obj=0x562b8feead60, name=0x562b8dec2b19 "realized", value=0x562b8ff09800, errp=0x562b8e617138 <error_abort>)
at ../qom/qom-qobject.c:28
9 0x0000562b8db26181 in object_property_set_bool (obj=0x562b8feead60, name=0x562b8dec2b19 "realized", value=false, errp=0x562b8e617138 <error_abort>) at ../qom/object.c:1489
10 0x0000562b8db1d82a in qdev_unrealize (dev=0x562b8feead60) at ../hw/core/qdev.c:306
11 0x0000562b8db1a500 in bus_set_realized (obj=0x562b8feeace0, value=false, errp=0x562b8e617138 <error_abort>) at ../hw/core/bus.c:205
12 0x0000562b8db27e45 in property_set_bool (obj=0x562b8feeace0, v=0x562b90e71800, name=0x562b8dec21f8 "realized", opaque=0x562b908fc660, errp=0x562b8e617138 <error_abort>)
at ../qom/object.c:2285
13 0x0000562b8db25e16 in object_property_set (obj=0x562b8feeace0, name=0x562b8dec21f8 "realized", v=0x562b90e71800, errp=0x562b8e617138 <error_abort>) at ../qom/object.c:1420
14 0x0000562b8db2a2ad in object_property_set_qobject (obj=0x562b8feeace0, name=0x562b8dec21f8 "realized", value=0x562b8ff097e0, errp=0x562b8e617138 <error_abort>)
at ../qom/qom-qobject.c:28
15 0x0000562b8db26181 in object_property_set_bool (obj=0x562b8feeace0, name=0x562b8dec21f8 "realized", value=false, errp=0x562b8e617138 <error_abort>) at ../qom/object.c:1489
16 0x0000562b8db1a3eb in qbus_unrealize (bus=0x562b8feeace0) at ../hw/core/bus.c:179
17 0x0000562b8db1e25f in device_set_realized (obj=0x562b8fee29a0, value=false, errp=0x562b8e617138 <error_abort>) at ../hw/core/qdev.c:593
18 0x0000562b8db27e45 in property_set_bool (obj=0x562b8fee29a0, v=0x562b90e6eac0, name=0x562b8dec2b19 "realized", opaque=0x562b8fc46070, errp=0x562b8e617138 <error_abort>)
at ../qom/object.c:2285
19 0x0000562b8db25e16 in object_property_set (obj=0x562b8fee29a0, name=0x562b8dec2b19 "realized", v=0x562b90e6eac0, errp=0x562b8e617138 <error_abort>) at ../qom/object.c:1420
20 0x0000562b8db2a2ad in object_property_set_qobject (obj=0x562b8fee29a0, name=0x562b8dec2b19 "realized", value=0x562b908fc800, errp=0x562b8e617138 <error_abort>)
at ../qom/qom-qobject.c:28
21 0x0000562b8db26181 in object_property_set_bool (obj=0x562b8fee29a0, name=0x562b8dec2b19 "realized", value=false, errp=0x562b8e617138 <error_abort>) at ../qom/object.c:1489
22 0x0000562b8db1d82a in qdev_unrealize (dev=0x562b8fee29a0) at ../hw/core/qdev.c:306
23 0x0000562b8d603a4a in acpi_pcihp_device_unplug_cb (hotplug_dev=0x562b90add210, s=0x562b90ade510, dev=0x562b8fee29a0, errp=0x562b8e617138 <error_abort>) at ../hw/acpi/pcihp.c:415
24 0x0000562b8d601397 in piix4_device_unplug_cb (hotplug_dev=0x562b90add210, dev=0x562b8fee29a0, errp=0x562b8e617138 <error_abort>) at ../hw/acpi/piix4.c:394
25 0x0000562b8db2265d in hotplug_handler_unplug (plug_handler=0x562b90add210, plugged_dev=0x562b8fee29a0, errp=0x562b8e617138 <error_abort>) at ../hw/core/hotplug.c:56
26 0x0000562b8d603386 in acpi_pcihp_eject_slot (s=0x562b90ade510, bsel=0, slots=64) at ../hw/acpi/pcihp.c:251
27 0x0000562b8d603e5f in pci_write (opaque=0x562b90ade510, addr=8, data=64, size=4) at ../hw/acpi/pcihp.c:528
28 0x0000562b8da641cd in memory_region_write_accessor (mr=0x562b90adf120, addr=8, value=0x7fff5484aae8, size=4, shift=0, mask=4294967295, attrs=...) at ../softmmu/memory.c:493
29 0x0000562b8da6441a in access_with_adjusted_size
(addr=8, value=0x7fff5484aae8, size=4, access_size_min=1, access_size_max=4, access_fn=0x562b8da640d7 <memory_region_write_accessor>, mr=0x562b90adf120, attrs=...) at ../softmmu/memory5
30 0x0000562b8da676ba in memory_region_dispatch_write (mr=0x562b90adf120, addr=8, data=64, op=MO_32, attrs=...) at ../softmmu/memory.c:1515
31 0x0000562b8da752a4 in flatview_write_continue (fv=0x562b8fefc190, addr=44552, attrs=..., ptr=0x7fff5484ac64, len=4, addr1=8, l=4, mr=0x562b90adf120) at ../softmmu/physmem.c:2826
32 0x0000562b8da75407 in flatview_write (fv=0x562b8fefc190, addr=44552, attrs=..., buf=0x7fff5484ac64, len=4) at ../softmmu/physmem.c:2868
-Type <RET> for more, q to quit, c to continue without paging--
33 0x0000562b8da757b7 in address_space_write (as=0x562b8e5f9f60 <address_space_io>, addr=44552, attrs=..., buf=0x7fff5484ac64, len=4) at ../softmmu/physmem.c:2964
34 0x0000562b8da60ad9 in cpu_outl (addr=44552, val=64) at ../softmmu/ioport.c:80
35 0x0000562b8da7a952 in qtest_process_command (chr=0x562b8fc49070, words=0x562b90e57cb0) at ../softmmu/qtest.c:482
36 0x0000562b8da7c565 in qtest_process_inbuf (chr=0x562b8fc49070, inbuf=0x562b8fe11da0) at ../softmmu/qtest.c:802
37 0x0000562b8da7c5f6 in qtest_read (opaque=0x562b8fc49070, buf=0x7fff5484afc0 "outl 0xae08 0x40\n306c\n", size=17) at ../softmmu/qtest.c:814
38 0x0000562b8dc3bc25 in qemu_chr_be_write_impl (s=0x562b8fe3f800, buf=0x7fff5484afc0 "outl 0xae08 0x40\n306c\n", len=17) at ../chardev/char.c:202
39 0x0000562b8dc3bc89 in qemu_chr_be_write (s=0x562b8fe3f800, buf=0x7fff5484afc0 "outl 0xae08 0x40\n306c\n", len=17) at ../chardev/char.c:214
40 0x0000562b8dc37846 in tcp_chr_read (chan=0x562b8fdf5e80, cond=G_IO_IN, opaque=0x562b8fe3f800) at ../chardev/char-socket.c:508
41 0x0000562b8db30e65 in qio_channel_fd_source_dispatch (source=0x562b90028ea0, callback=0x562b8dc376cb <tcp_chr_read>, user_data=0x562b8fe3f800) at ../io/channel-watch.c:84
42 0x00007f1d80e7cfaf in g_main_dispatch (context=0x562b8fc43e80) at ../glib/gmain.c:3417
43 g_main_context_dispatch (context=0x562b8fc43e80) at ../glib/gmain.c:4135
44 0x0000562b8dd1d9c4 in glib_pollfds_poll () at ../util/main-loop.c:294
45 0x0000562b8dd1da41 in os_host_main_loop_wait (timeout=27462700) at ../util/main-loop.c:317
46 0x0000562b8dd1db4f in main_loop_wait (nonblocking=0) at ../util/main-loop.c:603
47 0x0000562b8d822785 in qemu_main_loop () at ../softmmu/runstate.c:730
48 0x0000562b8d5ad486 in qemu_default_main () at ../softmmu/main.c:37
49 0x0000562b8d5ad4bc in main (argc=21, argv=0x7fff5484c2c8) at ../softmmu/main.c:48
Peter Xu (4):
memory: Make memory_listeners RCU-safe for real
memory: Use rcu list variance for address_spaces modifications
memory: Protect memory_region_clear_dirty_bitmap with RCU
memory: Use rcu traversal in memory_region_to_address_space
softmmu/memory.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
--
2.39.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH RFC 1/4] memory: Make memory_listeners RCU-safe for real
2023-02-25 16:31 [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Peter Xu
@ 2023-02-25 16:31 ` Peter Xu
2023-02-25 16:31 ` [PATCH RFC 2/4] memory: Use rcu list variance for address_spaces modifications Peter Xu
` (4 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2023-02-25 16:31 UTC (permalink / raw)
To: qemu-devel
Cc: Maxim Levitsky, Stefan Hajnoczi, Juan Quintela, Paolo Bonzini,
David Hildenbrand, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé,
peterx
I think the plan was making memory_listeners rcu-safe, but maybe not
really. This patch does it for real, by using RCU variances of qtailq
helpers when modifying memory_listeners. The modification should be
serialized by BQL, add assertions to register/unregister functions.
Wait for a quiecent period before returning from unregister of memory
listener to make sure any rcu reader is accessing valid listeners.
AddressSpace.listeners are protected in the same way, so do the same.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
softmmu/memory.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9d64efca26..a63e0bcbb7 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -3016,30 +3016,32 @@ void memory_listener_register(MemoryListener *listener, AddressSpace *as)
/* Only one of them can be defined for a listener */
assert(!(listener->log_sync && listener->log_sync_global));
+ /* Ownership of memory_listeners & as->listeners for modifications */
+ assert(qemu_mutex_iothread_locked());
listener->address_space = as;
if (QTAILQ_EMPTY(&memory_listeners)
|| listener->priority >= QTAILQ_LAST(&memory_listeners)->priority) {
- QTAILQ_INSERT_TAIL(&memory_listeners, listener, link);
+ QTAILQ_INSERT_TAIL_RCU(&memory_listeners, listener, link);
} else {
QTAILQ_FOREACH(other, &memory_listeners, link) {
if (listener->priority < other->priority) {
break;
}
}
- QTAILQ_INSERT_BEFORE(other, listener, link);
+ QTAILQ_INSERT_BEFORE_RCU(other, listener, link);
}
if (QTAILQ_EMPTY(&as->listeners)
|| listener->priority >= QTAILQ_LAST(&as->listeners)->priority) {
- QTAILQ_INSERT_TAIL(&as->listeners, listener, link_as);
+ QTAILQ_INSERT_TAIL_RCU(&as->listeners, listener, link_as);
} else {
QTAILQ_FOREACH(other, &as->listeners, link_as) {
if (listener->priority < other->priority) {
break;
}
}
- QTAILQ_INSERT_BEFORE(other, listener, link_as);
+ QTAILQ_INSERT_BEFORE_RCU(other, listener, link_as);
}
listener_add_address_space(listener, as);
@@ -3051,9 +3053,14 @@ void memory_listener_unregister(MemoryListener *listener)
return;
}
+ /* Ownership of memory_listeners & as->listeners for modifications */
+ assert(qemu_mutex_iothread_locked());
+
listener_del_address_space(listener, listener->address_space);
- QTAILQ_REMOVE(&memory_listeners, listener, link);
- QTAILQ_REMOVE(&listener->address_space->listeners, listener, link_as);
+ QTAILQ_REMOVE_RCU(&memory_listeners, listener, link);
+ QTAILQ_REMOVE_RCU(&listener->address_space->listeners, listener, link_as);
+ /* Wait for RCU readers to finish. NOTE! this may release BQL */
+ drain_call_rcu();
listener->address_space = NULL;
}
--
2.39.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH RFC 2/4] memory: Use rcu list variance for address_spaces modifications
2023-02-25 16:31 [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Peter Xu
2023-02-25 16:31 ` [PATCH RFC 1/4] memory: Make memory_listeners RCU-safe for real Peter Xu
@ 2023-02-25 16:31 ` Peter Xu
2023-02-25 16:31 ` [PATCH RFC 3/4] memory: Protect memory_region_clear_dirty_bitmap with RCU Peter Xu
` (3 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2023-02-25 16:31 UTC (permalink / raw)
To: qemu-devel
Cc: Maxim Levitsky, Stefan Hajnoczi, Juan Quintela, Paolo Bonzini,
David Hildenbrand, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé,
peterx
AddressSpace should be safe to RCU since it's released with call_rcu.
Change the list insert/removal to use RCU variances.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
softmmu/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/softmmu/memory.c b/softmmu/memory.c
index a63e0bcbb7..c48e9cc6ed 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -3079,7 +3079,7 @@ void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
as->ioeventfd_nb = 0;
as->ioeventfds = NULL;
QTAILQ_INIT(&as->listeners);
- QTAILQ_INSERT_TAIL(&address_spaces, as, address_spaces_link);
+ QTAILQ_INSERT_TAIL_RCU(&address_spaces, as, address_spaces_link);
as->name = g_strdup(name ? name : "anonymous");
address_space_update_topology(as);
address_space_update_ioeventfds(as);
@@ -3103,7 +3103,7 @@ void address_space_destroy(AddressSpace *as)
memory_region_transaction_begin();
as->root = NULL;
memory_region_transaction_commit();
- QTAILQ_REMOVE(&address_spaces, as, address_spaces_link);
+ QTAILQ_REMOVE_RCU(&address_spaces, as, address_spaces_link);
/* At this point, as->dispatch and as->current_map are dummy
* entries that the guest should never use. Wait for the old
--
2.39.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH RFC 3/4] memory: Protect memory_region_clear_dirty_bitmap with RCU
2023-02-25 16:31 [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Peter Xu
2023-02-25 16:31 ` [PATCH RFC 1/4] memory: Make memory_listeners RCU-safe for real Peter Xu
2023-02-25 16:31 ` [PATCH RFC 2/4] memory: Use rcu list variance for address_spaces modifications Peter Xu
@ 2023-02-25 16:31 ` Peter Xu
2023-02-25 16:31 ` [PATCH RFC 4/4] memory: Use rcu traversal in memory_region_to_address_space Peter Xu
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2023-02-25 16:31 UTC (permalink / raw)
To: qemu-devel
Cc: Maxim Levitsky, Stefan Hajnoczi, Juan Quintela, Paolo Bonzini,
David Hildenbrand, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé,
peterx
Clear dirty bitmap operation needs to walk memory_listeners but the context
may not hold BQL.
These callers hold BQL for it:
cpu_physical_memory_sync_dirty_bitmap
dirtyrate_manual_reset_protect
These callers hold RCU for it:
migration_clear_memory_region_dirty_bitmap [1]
cpu_physical_memory_test_and_clear_dirty
The above case [1] is extremely unobvious and probably still buggy,
because:
- Either the RCU read lock was taken very high from the stack in
ram_save_iterate() or ram_save_queue_pages() where the RCU lock was
probably taken for the sake of ramblock references (which is also
protected by RCU), or,
- I _think_ there's path that leaks taking any lock (e.g. the other path
migration_clear_memory_region_dirty_bitmap_range which is used by
virtio-mem or virtio-balloon that may or may not really take RCU at all,
neither BQL).
Add the RCU read lock in memory_region_clear_dirty_bitmap() to make sure
it's not missed.
The RCU is also needed for address_space_get_flatview(), so this will
generally making the RCU section larger to cover the whole walking process
when not taken, but wanted.
This should be the only place that we referenced memory_listeners (or
as->listeners) without guaranteed to hold BQL nor RCU.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
softmmu/memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/softmmu/memory.c b/softmmu/memory.c
index c48e9cc6ed..95cdcaeccf 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2270,7 +2270,8 @@ void memory_region_clear_dirty_bitmap(MemoryRegion *mr, hwaddr start,
FlatRange *fr;
hwaddr sec_start, sec_end, sec_size;
- QTAILQ_FOREACH(listener, &memory_listeners, link) {
+ RCU_READ_LOCK_GUARD();
+ QTAILQ_FOREACH_RCU(listener, &memory_listeners, link) {
if (!listener->log_clear) {
continue;
}
--
2.39.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH RFC 4/4] memory: Use rcu traversal in memory_region_to_address_space
2023-02-25 16:31 [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Peter Xu
` (2 preceding siblings ...)
2023-02-25 16:31 ` [PATCH RFC 3/4] memory: Protect memory_region_clear_dirty_bitmap with RCU Peter Xu
@ 2023-02-25 16:31 ` Peter Xu
2023-03-01 0:09 ` [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Stefan Hajnoczi
2023-03-02 9:46 ` David Hildenbrand
5 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2023-02-25 16:31 UTC (permalink / raw)
To: qemu-devel
Cc: Maxim Levitsky, Stefan Hajnoczi, Juan Quintela, Paolo Bonzini,
David Hildenbrand, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé,
peterx
memory_region_to_address_space() is the only function that walks the
address_spaces list using RCU read lock only. Use the RCU walker to
reflect that.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
softmmu/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 95cdcaeccf..0b652d9597 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -566,7 +566,7 @@ static AddressSpace *memory_region_to_address_space(MemoryRegion *mr)
while (mr->container) {
mr = mr->container;
}
- QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
+ QTAILQ_FOREACH_RCU(as, &address_spaces, address_spaces_link) {
if (mr == as->root) {
return as;
}
--
2.39.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-02-25 16:31 [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Peter Xu
` (3 preceding siblings ...)
2023-02-25 16:31 ` [PATCH RFC 4/4] memory: Use rcu traversal in memory_region_to_address_space Peter Xu
@ 2023-03-01 0:09 ` Stefan Hajnoczi
2023-03-01 16:08 ` Peter Xu
2023-03-02 9:46 ` David Hildenbrand
5 siblings, 1 reply; 15+ messages in thread
From: Stefan Hajnoczi @ 2023-03-01 0:09 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Maxim Levitsky, Juan Quintela, Paolo Bonzini,
David Hildenbrand, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
[-- Attachment #1: Type: text/plain, Size: 2357 bytes --]
On Sat, Feb 25, 2023 at 11:31:37AM -0500, Peter Xu wrote:
> [not for merging, but for discussion; this is something I found when
> looking at another issue on Chuang's optimization for migration downtime]
>
> Summary: we tried to access memory_listeners, address_spaces, etc. in RCU
> way. However we didn't implement them with RCU-safety. This patchset is
> trying to do that; at least making it closer.
>
> NOTE! It's doing it wrongly for now, so please feel free to see this as a
> thread to start discussing this problem, as in subject.
>
> The core problem here is how to make sure memory listeners will be freed in
> RCU ways, per when unlinking them from the global memory_listeners list.
>
> The current patchset (in patch 1) did it with drain_call_rcu(), but of
> course it's wrong, because of at least two things:
>
> (1) drain_call_rcu() will release BQL; currently there's no way to me to
> guarantee that releasing BQL is safe here.
>
> (2) memory_listener_unregister() can be called within a RCU read lock
> itself (we're so happy to take rcu read lock in many places but we
> don't think much on how long it'll be taken; at least not as strict
> as the kernel variance, so we're just less care about that fact yet).
> It means, drain_call_rcu() should deadlock there waiting for itself.
> For an example, see Appendix A.
>
> Side question to Stefan / Maxim: why do we need drain_call_rcu() and what's
> its difference from synchronize_rcu() in API level besides releasing and
> retaking BQL when taken?
Hi,
I haven't taken a look at the patches or thought about the larger
problem you're tackling here, but I wanted to reply to this specific
question.
It's been a long time since Maxim, Paolo, and I discussed this, but
drain_call_rcu() and synchronize_rcu() do different things:
- drain_call_rcu() is about waiting until the current thread's
call_rcu() callbacks have completed.
- synchronize_rcu() is about waiting until there are no more readers in
the last grace period.
Calling synchronize_rcu() doesn't guarantee that call_rcu_thread() has
completed pending call_rcu() callbacks. Therefore it's not appropriate
for the existing drain_call_rcu() callers because they rely on previous
call_rcu() callbacks to have finished.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-03-01 0:09 ` [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Stefan Hajnoczi
@ 2023-03-01 16:08 ` Peter Xu
0 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2023-03-01 16:08 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, Maxim Levitsky, Juan Quintela, Paolo Bonzini,
David Hildenbrand, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
On Tue, Feb 28, 2023 at 07:09:57PM -0500, Stefan Hajnoczi wrote:
> On Sat, Feb 25, 2023 at 11:31:37AM -0500, Peter Xu wrote:
> > [not for merging, but for discussion; this is something I found when
> > looking at another issue on Chuang's optimization for migration downtime]
> >
> > Summary: we tried to access memory_listeners, address_spaces, etc. in RCU
> > way. However we didn't implement them with RCU-safety. This patchset is
> > trying to do that; at least making it closer.
> >
> > NOTE! It's doing it wrongly for now, so please feel free to see this as a
> > thread to start discussing this problem, as in subject.
> >
> > The core problem here is how to make sure memory listeners will be freed in
> > RCU ways, per when unlinking them from the global memory_listeners list.
> >
> > The current patchset (in patch 1) did it with drain_call_rcu(), but of
> > course it's wrong, because of at least two things:
> >
> > (1) drain_call_rcu() will release BQL; currently there's no way to me to
> > guarantee that releasing BQL is safe here.
> >
> > (2) memory_listener_unregister() can be called within a RCU read lock
> > itself (we're so happy to take rcu read lock in many places but we
> > don't think much on how long it'll be taken; at least not as strict
> > as the kernel variance, so we're just less care about that fact yet).
> > It means, drain_call_rcu() should deadlock there waiting for itself.
> > For an example, see Appendix A.
> >
> > Side question to Stefan / Maxim: why do we need drain_call_rcu() and what's
> > its difference from synchronize_rcu() in API level besides releasing and
> > retaking BQL when taken?
>
> Hi,
> I haven't taken a look at the patches or thought about the larger
> problem you're tackling here, but I wanted to reply to this specific
> question.
>
> It's been a long time since Maxim, Paolo, and I discussed this, but
> drain_call_rcu() and synchronize_rcu() do different things:
> - drain_call_rcu() is about waiting until the current thread's
> call_rcu() callbacks have completed.
> - synchronize_rcu() is about waiting until there are no more readers in
> the last grace period.
>
> Calling synchronize_rcu() doesn't guarantee that call_rcu_thread() has
> completed pending call_rcu() callbacks. Therefore it's not appropriate
> for the existing drain_call_rcu() callers because they rely on previous
> call_rcu() callbacks to have finished.
Ah I missed that detail.
I was quickly thinking whether such a requirement can also be done with a
customized rcu callback that will simply kick a signal after the real
"free" is done, while the call_rcu() context can wait for the signal. It's
just that assuming RCU callbacks will be executed in order is slightly
tricky. But I guess it's also hard if the call_rcu() is deep in the stack
so drain_call_rcu() should avoid fiddling on the details.
Thanks Stefan!
--
Peter Xu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-02-25 16:31 [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Peter Xu
` (4 preceding siblings ...)
2023-03-01 0:09 ` [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Stefan Hajnoczi
@ 2023-03-02 9:46 ` David Hildenbrand
2023-03-02 14:45 ` Peter Xu
5 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2023-03-02 9:46 UTC (permalink / raw)
To: Peter Xu, qemu-devel
Cc: Maxim Levitsky, Stefan Hajnoczi, Juan Quintela, Paolo Bonzini,
Dr . David Alan Gilbert, Chuang Xu, Philippe Mathieu-Daudé
On 25.02.23 17:31, Peter Xu wrote:
> [not for merging, but for discussion; this is something I found when
> looking at another issue on Chuang's optimization for migration downtime]
>
> Summary: we tried to access memory_listeners, address_spaces, etc. in RCU
> way. However we didn't implement them with RCU-safety. This patchset is
> trying to do that; at least making it closer.
>
> NOTE! It's doing it wrongly for now, so please feel free to see this as a
> thread to start discussing this problem, as in subject.
>
> The core problem here is how to make sure memory listeners will be freed in
> RCU ways, per when unlinking them from the global memory_listeners list.
Can you elaborate why we would want to do that? Is there a real reason
we cannot hold the BQL when unregistering a listener?
Or could we use any other, more fine-grained, lock to protect the memory
listeners?
Naive me would think that any interactions between someone updating the
memory listeners, and a listener getting removed, would require some
careful synchronization (to not rip a notifier out while someone else
notifies -- what is the still registered notifier supposed to do with
notifications while it is already going away?), instead of doing it via RCU.
I'm all for using RCU if it improves performance and keeps things
simple. If RCU is neither required for performance reason and
overcomplicates the implementation, maybe using locking is the better
choice.
TBH, so far I thought that any memory_listeners register/unregistering
*requires* the BQL, and everything else is a BUG.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-03-02 9:46 ` David Hildenbrand
@ 2023-03-02 14:45 ` Peter Xu
2023-03-02 14:56 ` Peter Xu
2023-03-02 15:11 ` David Hildenbrand
0 siblings, 2 replies; 15+ messages in thread
From: Peter Xu @ 2023-03-02 14:45 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Maxim Levitsky, Stefan Hajnoczi, Juan Quintela,
Paolo Bonzini, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
On Thu, Mar 02, 2023 at 10:46:56AM +0100, David Hildenbrand wrote:
> On 25.02.23 17:31, Peter Xu wrote:
> > [not for merging, but for discussion; this is something I found when
> > looking at another issue on Chuang's optimization for migration downtime]
> >
> > Summary: we tried to access memory_listeners, address_spaces, etc. in RCU
> > way. However we didn't implement them with RCU-safety. This patchset is
> > trying to do that; at least making it closer.
> >
> > NOTE! It's doing it wrongly for now, so please feel free to see this as a
> > thread to start discussing this problem, as in subject.
> >
> > The core problem here is how to make sure memory listeners will be freed in
> > RCU ways, per when unlinking them from the global memory_listeners list.
>
> Can you elaborate why we would want to do that? Is there a real reason we
> cannot hold the BQL when unregistering a listener?
Yes afaict we must hold BQL when unregister any listener for now. I added
an explicit assert in patch 1 for that.
We want to do that because potentially we have RCU readers accessing these
two lists, so here taking BQL only is not enough. We need to release the
objects after all users are gone.
We already do that for address spaces, but afaict the listener part was
overlooked. The challenge here is how to achieve the same for listeners.
>
> Or could we use any other, more fine-grained, lock to protect the memory
> listeners?
>
> Naive me would think that any interactions between someone updating the
> memory listeners, and a listener getting removed, would require some careful
> synchronization (to not rip a notifier out while someone else notifies --
> what is the still registered notifier supposed to do with notifications
> while it is already going away?), instead of doing it via RCU.
>
> I'm all for using RCU if it improves performance and keeps things simple. If
> RCU is neither required for performance reason and overcomplicates the
> implementation, maybe using locking is the better choice.
For ASes, one major user RCU is memory_region_find_rcu().
For listeners, the only path that doesn't take BQL (afaict) is
memory_region_clear_dirty_bitmap(). Maybe you'll have some points here on
the side effect of taking it because it's in either virtio-mem or balloon
path for page hinting iirc.
In short, so far I don't know whether it's possible to have all paths take
BQL while not regress anything.
>
> TBH, so far I thought that any memory_listeners register/unregistering
> *requires* the BQL, and everything else is a BUG.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-03-02 14:45 ` Peter Xu
@ 2023-03-02 14:56 ` Peter Xu
2023-03-02 15:11 ` David Hildenbrand
1 sibling, 0 replies; 15+ messages in thread
From: Peter Xu @ 2023-03-02 14:56 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Maxim Levitsky, Stefan Hajnoczi, Juan Quintela,
Paolo Bonzini, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
On Thu, Mar 02, 2023 at 09:45:35AM -0500, Peter Xu wrote:
> On Thu, Mar 02, 2023 at 10:46:56AM +0100, David Hildenbrand wrote:
> > On 25.02.23 17:31, Peter Xu wrote:
> > > [not for merging, but for discussion; this is something I found when
> > > looking at another issue on Chuang's optimization for migration downtime]
> > >
> > > Summary: we tried to access memory_listeners, address_spaces, etc. in RCU
> > > way. However we didn't implement them with RCU-safety. This patchset is
> > > trying to do that; at least making it closer.
> > >
> > > NOTE! It's doing it wrongly for now, so please feel free to see this as a
> > > thread to start discussing this problem, as in subject.
> > >
> > > The core problem here is how to make sure memory listeners will be freed in
> > > RCU ways, per when unlinking them from the global memory_listeners list.
> >
> > Can you elaborate why we would want to do that? Is there a real reason we
> > cannot hold the BQL when unregistering a listener?
>
> Yes afaict we must hold BQL when unregister any listener for now. I added
> an explicit assert in patch 1 for that.
>
> We want to do that because potentially we have RCU readers accessing these
> two lists, so here taking BQL only is not enough. We need to release the
> objects after all users are gone.
>
> We already do that for address spaces, but afaict the listener part was
> overlooked. The challenge here is how to achieve the same for listeners.
>
> >
> > Or could we use any other, more fine-grained, lock to protect the memory
> > listeners?
> >
> > Naive me would think that any interactions between someone updating the
> > memory listeners, and a listener getting removed, would require some careful
> > synchronization (to not rip a notifier out while someone else notifies --
> > what is the still registered notifier supposed to do with notifications
> > while it is already going away?), instead of doing it via RCU.
> >
> > I'm all for using RCU if it improves performance and keeps things simple. If
> > RCU is neither required for performance reason and overcomplicates the
> > implementation, maybe using locking is the better choice.
>
> For ASes, one major user RCU is memory_region_find_rcu().
>
> For listeners, the only path that doesn't take BQL (afaict) is
> memory_region_clear_dirty_bitmap(). Maybe you'll have some points here on
> the side effect of taking it because it's in either virtio-mem or balloon
> path for page hinting iirc.
Ah I forgot the generic ram save migration also takes RCU here. So it's
definitely even more challenging (we already hold RCU for ramblocks there,
though).
>
> In short, so far I don't know whether it's possible to have all paths take
> BQL while not regress anything.
>
> >
> > TBH, so far I thought that any memory_listeners register/unregistering
> > *requires* the BQL, and everything else is a BUG.
>
> Thanks,
>
> --
> Peter Xu
--
Peter Xu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-03-02 14:45 ` Peter Xu
2023-03-02 14:56 ` Peter Xu
@ 2023-03-02 15:11 ` David Hildenbrand
2023-03-02 21:50 ` Peter Xu
1 sibling, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2023-03-02 15:11 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Maxim Levitsky, Stefan Hajnoczi, Juan Quintela,
Paolo Bonzini, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
On 02.03.23 15:45, Peter Xu wrote:
> On Thu, Mar 02, 2023 at 10:46:56AM +0100, David Hildenbrand wrote:
>> On 25.02.23 17:31, Peter Xu wrote:
>>> [not for merging, but for discussion; this is something I found when
>>> looking at another issue on Chuang's optimization for migration downtime]
>>>
>>> Summary: we tried to access memory_listeners, address_spaces, etc. in RCU
>>> way. However we didn't implement them with RCU-safety. This patchset is
>>> trying to do that; at least making it closer.
>>>
>>> NOTE! It's doing it wrongly for now, so please feel free to see this as a
>>> thread to start discussing this problem, as in subject.
>>>
>>> The core problem here is how to make sure memory listeners will be freed in
>>> RCU ways, per when unlinking them from the global memory_listeners list.
>>
>> Can you elaborate why we would want to do that? Is there a real reason we
>> cannot hold the BQL when unregistering a listener?
>
> Yes afaict we must hold BQL when unregister any listener for now. I added
> an explicit assert in patch 1 for that.
>
Oh, good!
> We want to do that because potentially we have RCU readers accessing these
> two lists, so here taking BQL only is not enough. We need to release the
> objects after all users are gone.
>
> We already do that for address spaces, but afaict the listener part was
> overlooked. The challenge here is how to achieve the same for listeners.
Ouch, ok thanks.
>
>>
>> Or could we use any other, more fine-grained, lock to protect the memory
>> listeners?
>>
>> Naive me would think that any interactions between someone updating the
>> memory listeners, and a listener getting removed, would require some careful
>> synchronization (to not rip a notifier out while someone else notifies --
>> what is the still registered notifier supposed to do with notifications
>> while it is already going away?), instead of doing it via RCU.
>>
>> I'm all for using RCU if it improves performance and keeps things simple. If
>> RCU is neither required for performance reason and overcomplicates the
>> implementation, maybe using locking is the better choice.
>
> For ASes, one major user RCU is memory_region_find_rcu().
>
> For listeners, the only path that doesn't take BQL (afaict) is
> memory_region_clear_dirty_bitmap(). Maybe you'll have some points here on
> the side effect of taking it because it's in either virtio-mem or balloon
> path for page hinting iirc.
So, we could end up in memory_region_clear_dirty_bitmap() when migration
starts (clearing initial bitmap), while migration is happening
(migrating one page), and during virtio-balloon qemu_guest_free_page_hint.
There should be no direct call from virtio-mem (anymore), only from
virtio-balloon. I don't think taking the BQL is particularly problematic
here.
I guess the main concern here would be overhead from gabbing/releasing
the BQL very often, and blocking the BQL while we're eventually in the
kernel, clearing bitmaps, correct?
Indeed, memory listener registration/removal doesn't happen very
frequently, while traversing the listeners can happen very often in
migration code IIUC.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-03-02 15:11 ` David Hildenbrand
@ 2023-03-02 21:50 ` Peter Xu
2023-03-03 9:10 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2023-03-02 21:50 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Maxim Levitsky, Stefan Hajnoczi, Juan Quintela,
Paolo Bonzini, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
On Thu, Mar 02, 2023 at 04:11:56PM +0100, David Hildenbrand wrote:
> I guess the main concern here would be overhead from gabbing/releasing the
> BQL very often, and blocking the BQL while we're eventually in the kernel,
> clearing bitmaps, correct?
More or less yes. I think it's pretty clear we move on with RCU unless
extremely necessary (which I don't think..), then it's about how to fix the
bug so rcu safety guaranteed.
--
Peter Xu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-03-02 21:50 ` Peter Xu
@ 2023-03-03 9:10 ` David Hildenbrand
2023-03-03 16:20 ` Peter Xu
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2023-03-03 9:10 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Maxim Levitsky, Stefan Hajnoczi, Juan Quintela,
Paolo Bonzini, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
On 02.03.23 22:50, Peter Xu wrote:
> On Thu, Mar 02, 2023 at 04:11:56PM +0100, David Hildenbrand wrote:
>> I guess the main concern here would be overhead from gabbing/releasing the
>> BQL very often, and blocking the BQL while we're eventually in the kernel,
>> clearing bitmaps, correct?
>
> More or less yes. I think it's pretty clear we move on with RCU unless
> extremely necessary (which I don't think..), then it's about how to fix the
> bug so rcu safety guaranteed.
What about an additional simple lock?
Like:
* register/unregister requires that new notifier lock + BQL
* traversing notifiers requires either that new lock or the BQL
We simply take the new lock in that problematic function. That would
work as long as we don't require traversal of the notifiers concurrently
-- and as long as we have a lot of bouncing back and forth (I don't
think we have, even in the migration context, or am I wrong?).
That way we also make sure that each notifier is only called once. I'm
not 100% sure if all notifiers would expect to be called concurrently.
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-03-03 9:10 ` David Hildenbrand
@ 2023-03-03 16:20 ` Peter Xu
2023-03-03 16:58 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2023-03-03 16:20 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Maxim Levitsky, Stefan Hajnoczi, Juan Quintela,
Paolo Bonzini, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
On Fri, Mar 03, 2023 at 10:10:12AM +0100, David Hildenbrand wrote:
> On 02.03.23 22:50, Peter Xu wrote:
> > On Thu, Mar 02, 2023 at 04:11:56PM +0100, David Hildenbrand wrote:
> > > I guess the main concern here would be overhead from gabbing/releasing the
> > > BQL very often, and blocking the BQL while we're eventually in the kernel,
> > > clearing bitmaps, correct?
> >
> > More or less yes. I think it's pretty clear we move on with RCU unless
> > extremely necessary (which I don't think..), then it's about how to fix the
> > bug so rcu safety guaranteed.
>
> What about an additional simple lock?
>
> Like:
>
> * register/unregister requires that new notifier lock + BQL
> * traversing notifiers requires either that new lock or the BQL
This will work, but this will also brings us backstep a bit.
I think we shouldn't allow concurrency for notifiers, more below. It's
more about sometimes QEMU walking the two lists has nothing to do with
notifiers (like memory_region_find_rcu), that's the major uncertainty to
me. Also on the future plans of using more RCU in QEMU code.
> We simply take the new lock in that problematic function. That would work as
> long as we don't require traversal of the notifiers concurrently -- and as
> long as we have a lot of bouncing back and forth (I don't think we have,
> even in the migration context, or am I wrong?).
>
> That way we also make sure that each notifier is only called once. I'm not
> 100% sure if all notifiers would expect to be called concurrently.
Yes I think so. AFAIU most of the notifiers should only be called with BQL
then they'll already be serialized (and hooks normally has yet another
layer of protection like kvm).
Clear log is something special. Afaik it's protected by RAMState's
bitmap_mutex so far, but not always..
The unaccuracy is because clear log can also be triggered outside migration
where there's no context of bitmap_mutex.
But AFAICT concurrent clear log is also fine because it was (somehow
tailored...) for kvm, so it'll anyway be serialized at kvm_slots_lock().
We'll need to be careful when growing log_clear support, though.
--
Peter Xu
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues
2023-03-03 16:20 ` Peter Xu
@ 2023-03-03 16:58 ` David Hildenbrand
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2023-03-03 16:58 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Maxim Levitsky, Stefan Hajnoczi, Juan Quintela,
Paolo Bonzini, Dr . David Alan Gilbert, Chuang Xu,
Philippe Mathieu-Daudé
On 03.03.23 17:20, Peter Xu wrote:
> On Fri, Mar 03, 2023 at 10:10:12AM +0100, David Hildenbrand wrote:
>> On 02.03.23 22:50, Peter Xu wrote:
>>> On Thu, Mar 02, 2023 at 04:11:56PM +0100, David Hildenbrand wrote:
>>>> I guess the main concern here would be overhead from gabbing/releasing the
>>>> BQL very often, and blocking the BQL while we're eventually in the kernel,
>>>> clearing bitmaps, correct?
>>>
>>> More or less yes. I think it's pretty clear we move on with RCU unless
>>> extremely necessary (which I don't think..), then it's about how to fix the
>>> bug so rcu safety guaranteed.
>>
>> What about an additional simple lock?
>>
>> Like:
>>
>> * register/unregister requires that new notifier lock + BQL
>> * traversing notifiers requires either that new lock or the BQL
>
> This will work, but this will also brings us backstep a bit.
>
> I think we shouldn't allow concurrency for notifiers, more below. It's
> more about sometimes QEMU walking the two lists has nothing to do with
> notifiers (like memory_region_find_rcu), that's the major uncertainty to
> me. Also on the future plans of using more RCU in QEMU code.
>
>> We simply take the new lock in that problematic function. That would work as
>> long as we don't require traversal of the notifiers concurrently -- and as
>> long as we have a lot of bouncing back and forth (I don't think we have,
>> even in the migration context, or am I wrong?).
>>
>> That way we also make sure that each notifier is only called once. I'm not
>> 100% sure if all notifiers would expect to be called concurrently.
>
> Yes I think so. AFAIU most of the notifiers should only be called with BQL
> then they'll already be serialized (and hooks normally has yet another
> layer of protection like kvm).
>
> Clear log is something special. Afaik it's protected by RAMState's
> bitmap_mutex so far, but not always..
>
> The unaccuracy is because clear log can also be triggered outside migration
> where there's no context of bitmap_mutex.
>
> But AFAICT concurrent clear log is also fine because it was (somehow
> tailored...) for kvm, so it'll anyway be serialized at kvm_slots_lock().
> We'll need to be careful when growing log_clear support, though.
>
On a related not, I was wondering if we should tackle this from a
different direction and not care about locking at all for this special
migration case.
The thing is, during migration most operation either are (or should be)
disabled. Consequently, I would expect that it's very rare that we even
get a register/unregister while migration is running. Anything that
might do it could already indicate a potential problem.
For example, device hotplug/unplug should be forbidden while migration
is happening.
guest_phys_blocks_append() temporarily registers a listener. IIRC, we
already disable memory dumping while migration is active. From what I
can tell, qmp_dump_skeys() and tpm_ppi_reset() could still call it, AFAIKs.
Do we have any other known call paths that are problematic while
migration is active? The guest_phys_blocks_append() could be
re-implemented easily to handle it without a temporary notifier
registration.
There are not too many calls of memory_listener_unregister().
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-03-03 16:59 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-25 16:31 [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Peter Xu
2023-02-25 16:31 ` [PATCH RFC 1/4] memory: Make memory_listeners RCU-safe for real Peter Xu
2023-02-25 16:31 ` [PATCH RFC 2/4] memory: Use rcu list variance for address_spaces modifications Peter Xu
2023-02-25 16:31 ` [PATCH RFC 3/4] memory: Protect memory_region_clear_dirty_bitmap with RCU Peter Xu
2023-02-25 16:31 ` [PATCH RFC 4/4] memory: Use rcu traversal in memory_region_to_address_space Peter Xu
2023-03-01 0:09 ` [PATCH RFC 0/4] memory: Fix (/ Discuss) a few rcu issues Stefan Hajnoczi
2023-03-01 16:08 ` Peter Xu
2023-03-02 9:46 ` David Hildenbrand
2023-03-02 14:45 ` Peter Xu
2023-03-02 14:56 ` Peter Xu
2023-03-02 15:11 ` David Hildenbrand
2023-03-02 21:50 ` Peter Xu
2023-03-03 9:10 ` David Hildenbrand
2023-03-03 16:20 ` Peter Xu
2023-03-03 16:58 ` David Hildenbrand
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.