* [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() @ 2022-03-25 15:40 Andrey Ryabinin 2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin 2022-03-30 18:47 ` [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Peter Xu 0 siblings, 2 replies; 10+ messages in thread From: Andrey Ryabinin @ 2022-03-25 15:40 UTC (permalink / raw) To: Paolo Bonzini, Peter Xu, David Hildenbrand, Philippe Mathieu-Daudé, Stefan Hajnoczi Cc: Andrey Ryabinin, qemu-devel, yc-core, qemu-stable The 'ram_list.blocks' modifications protected by 'ram_list.mutex'. last_ram_page() uses state of 'ram_list.blocks' to identify ram's size. ram_block_add() calls last_ram_page() before the mutex lock making the following race possible: CPU#0 CPU#1 ram_block_add() old_ram_size = last_ram_page() qemu_mutex_lock_ramlist() ... dirty_memory_extend(old_ram_size, new_ram_size); ram_block_add() old_ram_size = last_ram_page() //insert block to ram_list QLIST_INSERT_*_RCU() qemu_mutex_unlock_ramlist() qemu_mutex_lock_ramlist() .... dirty_memory_extend(old_ram_size, new_ram_size); Such race may result in leaking some dirty memory bitmaps. Because of stale 'old_ram_size' value, the dirty_memory_extend() on CPU#0 will allocate and reinitialize some of the already allocated on CPU#1 dirty memory bitmap blocks. Fix this by moving last_ram_page() call under the qemu_mutex_lock_ramlist() Cc: qemu-stable@nongnu.org Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> --- softmmu/physmem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 4e1b27a20e..32f76362bf 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -1969,9 +1969,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) ram_addr_t old_ram_size, new_ram_size; Error *err = NULL; + qemu_mutex_lock_ramlist(); old_ram_size = last_ram_page(); - qemu_mutex_lock_ramlist(); new_block->offset = find_ram_offset(new_block->max_length); if (!new_block->host) { -- 2.34.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak 2022-03-25 15:40 [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Andrey Ryabinin @ 2022-03-25 15:40 ` Andrey Ryabinin 2022-03-30 19:25 ` Peter Xu ` (2 more replies) 2022-03-30 18:47 ` [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Peter Xu 1 sibling, 3 replies; 10+ messages in thread From: Andrey Ryabinin @ 2022-03-25 15:40 UTC (permalink / raw) To: Paolo Bonzini, Peter Xu, David Hildenbrand, Philippe Mathieu-Daudé, Stefan Hajnoczi Cc: Andrey Ryabinin, qemu-devel, yc-core, qemu-stable The sequence of ram_block_add()/qemu_ram_free()/ram_block_add() function calls leads to leaking some memory. ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks for new memory. These blocks only grow but never shrink. So the qemu_ram_free() restores RAM size back to it's original stat but doesn't touch dirty memory bitmaps. After qemu_ram_free() there is no way of knowing that we have allocated dirty memory bitmaps beyond current RAM size. So the next ram_block_add() will call dirty_memory_extend() again to to allocate new bitmaps and rewrite pointers to bitmaps left after the first ram_block_add()/dirty_memory_extend() calls. Rework dirty_memory_extend() to be able to shrink dirty maps, also rename it to dirty_memory_resize(). And fix the leak by shrinking dirty memory maps on qemu_ram_free() if needed. Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") Cc: qemu-stable@nongnu.org Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> --- include/exec/ramlist.h | 2 ++ softmmu/physmem.c | 38 ++++++++++++++++++++++++++++++++------ 2 files changed, 34 insertions(+), 6 deletions(-) diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h index 2ad2a81acc..019e238e7c 100644 --- a/include/exec/ramlist.h +++ b/include/exec/ramlist.h @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier; #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8) typedef struct { struct rcu_head rcu; + unsigned int nr_blocks; + unsigned int nr_blocks_inuse; unsigned long *blocks[]; } DirtyMemoryBlocks; diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 32f76362bf..073ab37351 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -1919,8 +1919,23 @@ void qemu_ram_msync(RAMBlock *block, ram_addr_t start, ram_addr_t length) } } +static void dirty_memory_free(DirtyMemoryBlocks *blocks) +{ + int i; + + /* + *'nr_blocks_inuse' is more than nr_blocks (memory was extended) + * or it's less than 'nr_blocks' (memory shrunk). In the second case + * we free all the blocks above the nr_blocks_inuse. + */ + for (i = blocks->nr_blocks_inuse; i < blocks->nr_blocks; i++) { + g_free(blocks->blocks[i]); + } + g_free(blocks); +} + /* Called with ram_list.mutex held */ -static void dirty_memory_extend(ram_addr_t old_ram_size, +static void dirty_memory_resize(ram_addr_t old_ram_size, ram_addr_t new_ram_size) { ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size, @@ -1929,25 +1944,28 @@ static void dirty_memory_extend(ram_addr_t old_ram_size, DIRTY_MEMORY_BLOCK_SIZE); int i; - /* Only need to extend if block count increased */ - if (new_num_blocks <= old_num_blocks) { + /* Only need to resize if block count changed */ + if (new_num_blocks == old_num_blocks) { return; } for (i = 0; i < DIRTY_MEMORY_NUM; i++) { DirtyMemoryBlocks *old_blocks; DirtyMemoryBlocks *new_blocks; + unsigned int num_blocks = MAX(old_num_blocks, new_num_blocks); int j; old_blocks = qatomic_rcu_read(&ram_list.dirty_memory[i]); new_blocks = g_malloc(sizeof(*new_blocks) + - sizeof(new_blocks->blocks[0]) * new_num_blocks); + sizeof(new_blocks->blocks[0]) * num_blocks); + new_blocks->nr_blocks = new_num_blocks; if (old_num_blocks) { memcpy(new_blocks->blocks, old_blocks->blocks, old_num_blocks * sizeof(old_blocks->blocks[0])); } + /* memory extend case (new>old): allocate new blocks*/ for (j = old_num_blocks; j < new_num_blocks; j++) { new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE); } @@ -1955,7 +1973,8 @@ static void dirty_memory_extend(ram_addr_t old_ram_size, qatomic_rcu_set(&ram_list.dirty_memory[i], new_blocks); if (old_blocks) { - g_free_rcu(old_blocks, rcu); + old_blocks->nr_blocks_inuse = new_num_blocks; + call_rcu(old_blocks, dirty_memory_free, rcu); } } } @@ -2001,7 +2020,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) new_ram_size = MAX(old_ram_size, (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS); if (new_ram_size > old_ram_size) { - dirty_memory_extend(old_ram_size, new_ram_size); + dirty_memory_resize(old_ram_size, new_ram_size); } /* Keep the list sorted from biggest to smallest block. Unlike QTAILQ, * QLIST (which has an RCU-friendly variant) does not have insertion at @@ -2218,6 +2237,8 @@ static void reclaim_ramblock(RAMBlock *block) void qemu_ram_free(RAMBlock *block) { + ram_addr_t old_ram_size, new_ram_size; + if (!block) { return; } @@ -2228,12 +2249,17 @@ void qemu_ram_free(RAMBlock *block) } qemu_mutex_lock_ramlist(); + old_ram_size = last_ram_page(); + QLIST_REMOVE_RCU(block, next); ram_list.mru_block = NULL; /* Write list before version */ smp_wmb(); ram_list.version++; call_rcu(block, reclaim_ramblock, rcu); + + new_ram_size = last_ram_page(); + dirty_memory_resize(old_ram_size, new_ram_size); qemu_mutex_unlock_ramlist(); } -- 2.34.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak 2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin @ 2022-03-30 19:25 ` Peter Xu 2022-03-31 16:14 ` Andrey Ryabinin 2022-03-31 8:37 ` David Hildenbrand 2022-03-31 17:29 ` Peter Xu 2 siblings, 1 reply; 10+ messages in thread From: Peter Xu @ 2022-03-30 19:25 UTC (permalink / raw) To: Andrey Ryabinin Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel, Philippe Mathieu-Daudé, yc-core, Paolo Bonzini On Fri, Mar 25, 2022 at 06:40:13PM +0300, Andrey Ryabinin wrote: > The sequence of ram_block_add()/qemu_ram_free()/ram_block_add() > function calls leads to leaking some memory. > > ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks > for new memory. These blocks only grow but never shrink. So the > qemu_ram_free() restores RAM size back to it's original stat but > doesn't touch dirty memory bitmaps. > > After qemu_ram_free() there is no way of knowing that we have > allocated dirty memory bitmaps beyond current RAM size. > So the next ram_block_add() will call dirty_memory_extend() again to > to allocate new bitmaps and rewrite pointers to bitmaps left after > the first ram_block_add()/dirty_memory_extend() calls. > > Rework dirty_memory_extend() to be able to shrink dirty maps, > also rename it to dirty_memory_resize(). And fix the leak by > shrinking dirty memory maps on qemu_ram_free() if needed. > > Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") > Cc: qemu-stable@nongnu.org > Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> > --- > include/exec/ramlist.h | 2 ++ > softmmu/physmem.c | 38 ++++++++++++++++++++++++++++++++------ > 2 files changed, 34 insertions(+), 6 deletions(-) > > diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h > index 2ad2a81acc..019e238e7c 100644 > --- a/include/exec/ramlist.h > +++ b/include/exec/ramlist.h > @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier; > #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8) > typedef struct { > struct rcu_head rcu; > + unsigned int nr_blocks; > + unsigned int nr_blocks_inuse; > unsigned long *blocks[]; > } DirtyMemoryBlocks; The problem looks valid, but can we avoid introducing these variables at all? IIUC what we miss here is the proper releasing of dirty blocks when ram is released. IMHO as long as we properly release the extra dirty memory blocks in qemu_ram_free(), then last_ram_page() will reflect the existing dirty memory block size. That looks simpler if I'm correct.. Side question: both of the patches are not regression introduced in this release, right? So they are targeting for the next release? Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak 2022-03-30 19:25 ` Peter Xu @ 2022-03-31 16:14 ` Andrey Ryabinin 2022-03-31 17:34 ` Peter Xu 0 siblings, 1 reply; 10+ messages in thread From: Andrey Ryabinin @ 2022-03-31 16:14 UTC (permalink / raw) To: Peter Xu Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel, Philippe Mathieu-Daudé, yc-core, Paolo Bonzini On 3/30/22 22:25, Peter Xu wrote: > On Fri, Mar 25, 2022 at 06:40:13PM +0300, Andrey Ryabinin wrote: >> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add() >> function calls leads to leaking some memory. >> >> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks >> for new memory. These blocks only grow but never shrink. So the >> qemu_ram_free() restores RAM size back to it's original stat but >> doesn't touch dirty memory bitmaps. >> >> After qemu_ram_free() there is no way of knowing that we have >> allocated dirty memory bitmaps beyond current RAM size. >> So the next ram_block_add() will call dirty_memory_extend() again to >> to allocate new bitmaps and rewrite pointers to bitmaps left after >> the first ram_block_add()/dirty_memory_extend() calls. >> >> Rework dirty_memory_extend() to be able to shrink dirty maps, >> also rename it to dirty_memory_resize(). And fix the leak by >> shrinking dirty memory maps on qemu_ram_free() if needed. >> >> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") >> Cc: qemu-stable@nongnu.org >> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> >> --- >> include/exec/ramlist.h | 2 ++ >> softmmu/physmem.c | 38 ++++++++++++++++++++++++++++++++------ >> 2 files changed, 34 insertions(+), 6 deletions(-) >> >> diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h >> index 2ad2a81acc..019e238e7c 100644 >> --- a/include/exec/ramlist.h >> +++ b/include/exec/ramlist.h >> @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier; >> #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8) >> typedef struct { >> struct rcu_head rcu; >> + unsigned int nr_blocks; >> + unsigned int nr_blocks_inuse; >> unsigned long *blocks[]; >> } DirtyMemoryBlocks; > > The problem looks valid, but can we avoid introducing these variables at > all? > > IIUC what we miss here is the proper releasing of dirty blocks when ram is > released. IMHO as long as we properly release the extra dirty memory > blocks in qemu_ram_free(), then last_ram_page() will reflect the existing > dirty memory block size. That looks simpler if I'm correct.. > ram_list.dirty_memory is RCU protected which means we can't free it immediately. Freeing must be delayed until the end of RCU grace period. One way to do it is using rcu callback, like in this patch. That why we need these variables - to pass the information into rcu callback. Another way is using synchronize_rcu() before freeing the memory. In that case the variables won't be needed. But it's expensive. Also I'm not sure if using synchronize_rcu() under a mutex lock is a good idea. Perhaps it will needed somehow to rework dirty_memory_resize() to move synchronize_rcu() and freeing outside of mutex_lock()/unlock() section. > Side question: both of the patches are not regression introduced in this > release, right? So they are targeting for the next release? > Right, these bugs are old. I'm not familiar with qemu release schedule/procedure, so whatever release is more appropriate. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak 2022-03-31 16:14 ` Andrey Ryabinin @ 2022-03-31 17:34 ` Peter Xu 0 siblings, 0 replies; 10+ messages in thread From: Peter Xu @ 2022-03-31 17:34 UTC (permalink / raw) To: Andrey Ryabinin Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel, Philippe Mathieu-Daudé, yc-core, Paolo Bonzini On Thu, Mar 31, 2022 at 07:14:12PM +0300, Andrey Ryabinin wrote: > > > On 3/30/22 22:25, Peter Xu wrote: > > On Fri, Mar 25, 2022 at 06:40:13PM +0300, Andrey Ryabinin wrote: > >> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add() > >> function calls leads to leaking some memory. > >> > >> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks > >> for new memory. These blocks only grow but never shrink. So the > >> qemu_ram_free() restores RAM size back to it's original stat but > >> doesn't touch dirty memory bitmaps. > >> > >> After qemu_ram_free() there is no way of knowing that we have > >> allocated dirty memory bitmaps beyond current RAM size. > >> So the next ram_block_add() will call dirty_memory_extend() again to > >> to allocate new bitmaps and rewrite pointers to bitmaps left after > >> the first ram_block_add()/dirty_memory_extend() calls. > >> > >> Rework dirty_memory_extend() to be able to shrink dirty maps, > >> also rename it to dirty_memory_resize(). And fix the leak by > >> shrinking dirty memory maps on qemu_ram_free() if needed. > >> > >> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") > >> Cc: qemu-stable@nongnu.org > >> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> > >> --- > >> include/exec/ramlist.h | 2 ++ > >> softmmu/physmem.c | 38 ++++++++++++++++++++++++++++++++------ > >> 2 files changed, 34 insertions(+), 6 deletions(-) > >> > >> diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h > >> index 2ad2a81acc..019e238e7c 100644 > >> --- a/include/exec/ramlist.h > >> +++ b/include/exec/ramlist.h > >> @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier; > >> #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8) > >> typedef struct { > >> struct rcu_head rcu; > >> + unsigned int nr_blocks; > >> + unsigned int nr_blocks_inuse; > >> unsigned long *blocks[]; > >> } DirtyMemoryBlocks; > > > > The problem looks valid, but can we avoid introducing these variables at > > all? > > > > IIUC what we miss here is the proper releasing of dirty blocks when ram is > > released. IMHO as long as we properly release the extra dirty memory > > blocks in qemu_ram_free(), then last_ram_page() will reflect the existing > > dirty memory block size. That looks simpler if I'm correct.. > > > > ram_list.dirty_memory is RCU protected which means we can't free it immediately. > Freeing must be delayed until the end of RCU grace period. > One way to do it is using rcu callback, like in this patch. That why we need these > variables - to pass the information into rcu callback. > > Another way is using synchronize_rcu() before freeing the memory. In that case > the variables won't be needed. But it's expensive. > Also I'm not sure if using synchronize_rcu() under a mutex lock is a good idea. > Perhaps it will needed somehow to rework dirty_memory_resize() to move synchronize_rcu() and freeing > outside of mutex_lock()/unlock() section. I suggested that just because from the 1st glance I failed to read clearly on the current patch due to some trivial details (I commented inline), which made me thought that merging extend/shrink is not necessary. But I agree it looks mostly correct, and call_rcu() should indeed be preferred. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak 2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin 2022-03-30 19:25 ` Peter Xu @ 2022-03-31 8:37 ` David Hildenbrand 2022-03-31 12:27 ` Peter Xu 2022-03-31 17:29 ` Peter Xu 2 siblings, 1 reply; 10+ messages in thread From: David Hildenbrand @ 2022-03-31 8:37 UTC (permalink / raw) To: Andrey Ryabinin, Paolo Bonzini, Peter Xu, Philippe Mathieu-Daudé, Stefan Hajnoczi Cc: qemu-devel, yc-core, qemu-stable On 25.03.22 16:40, Andrey Ryabinin wrote: > The sequence of ram_block_add()/qemu_ram_free()/ram_block_add() > function calls leads to leaking some memory. > > ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks > for new memory. These blocks only grow but never shrink. So the > qemu_ram_free() restores RAM size back to it's original stat but > doesn't touch dirty memory bitmaps. > > After qemu_ram_free() there is no way of knowing that we have > allocated dirty memory bitmaps beyond current RAM size. > So the next ram_block_add() will call dirty_memory_extend() again to > to allocate new bitmaps and rewrite pointers to bitmaps left after > the first ram_block_add()/dirty_memory_extend() calls. > > Rework dirty_memory_extend() to be able to shrink dirty maps, > also rename it to dirty_memory_resize(). And fix the leak by > shrinking dirty memory maps on qemu_ram_free() if needed. > > Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") > Cc: qemu-stable@nongnu.org > Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> I looked at this a while ago and I think the problem is more involved, because we might actually generate holes for which we can free the bitmap. I think this patch impoves the situation, though. IIRC if you hotplug two dimms and then hotunplug only the latter, the bitmap for the first dimm will remain as long as the second dimm isn't hotunplugged. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak 2022-03-31 8:37 ` David Hildenbrand @ 2022-03-31 12:27 ` Peter Xu 2022-03-31 16:26 ` David Hildenbrand 0 siblings, 1 reply; 10+ messages in thread From: Peter Xu @ 2022-03-31 12:27 UTC (permalink / raw) To: David Hildenbrand Cc: Stefan Hajnoczi, qemu-stable, Andrey Ryabinin, Philippe Mathieu-Daudé, qemu-devel, yc-core, Paolo Bonzini On Thu, Mar 31, 2022 at 10:37:39AM +0200, David Hildenbrand wrote: > On 25.03.22 16:40, Andrey Ryabinin wrote: > > The sequence of ram_block_add()/qemu_ram_free()/ram_block_add() > > function calls leads to leaking some memory. > > > > ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks > > for new memory. These blocks only grow but never shrink. So the > > qemu_ram_free() restores RAM size back to it's original stat but > > doesn't touch dirty memory bitmaps. > > > > After qemu_ram_free() there is no way of knowing that we have > > allocated dirty memory bitmaps beyond current RAM size. > > So the next ram_block_add() will call dirty_memory_extend() again to > > to allocate new bitmaps and rewrite pointers to bitmaps left after > > the first ram_block_add()/dirty_memory_extend() calls. > > > > Rework dirty_memory_extend() to be able to shrink dirty maps, > > also rename it to dirty_memory_resize(). And fix the leak by > > shrinking dirty memory maps on qemu_ram_free() if needed. > > > > Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") > > Cc: qemu-stable@nongnu.org > > Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> > > I looked at this a while ago and I think the problem is more involved, > because we might actually generate holes for which we can free the > bitmap. I think this patch impoves the situation, though. > > > IIRC if you hotplug two dimms and then hotunplug only the latter, the I assume you meant "former"? :) > bitmap for the first dimm will remain as long as the second dimm isn't > hotunplugged. IMHO it's fine to keep the dirty block for the unplugged hole. It'll be better if we could free it, but we can fix the memory leak first which seems to be more severe. The dirty memory isn't extremely large (32K ratio to mem size) if just to be kept idle, but frequent plug/unplug will leak infinite host mem. -- Peter Xu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak 2022-03-31 12:27 ` Peter Xu @ 2022-03-31 16:26 ` David Hildenbrand 0 siblings, 0 replies; 10+ messages in thread From: David Hildenbrand @ 2022-03-31 16:26 UTC (permalink / raw) To: Peter Xu Cc: Stefan Hajnoczi, qemu-stable, Andrey Ryabinin, Philippe Mathieu-Daudé, qemu-devel, yc-core, Paolo Bonzini On 31.03.22 14:27, Peter Xu wrote: > On Thu, Mar 31, 2022 at 10:37:39AM +0200, David Hildenbrand wrote: >> On 25.03.22 16:40, Andrey Ryabinin wrote: >>> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add() >>> function calls leads to leaking some memory. >>> >>> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks >>> for new memory. These blocks only grow but never shrink. So the >>> qemu_ram_free() restores RAM size back to it's original stat but >>> doesn't touch dirty memory bitmaps. >>> >>> After qemu_ram_free() there is no way of knowing that we have >>> allocated dirty memory bitmaps beyond current RAM size. >>> So the next ram_block_add() will call dirty_memory_extend() again to >>> to allocate new bitmaps and rewrite pointers to bitmaps left after >>> the first ram_block_add()/dirty_memory_extend() calls. >>> >>> Rework dirty_memory_extend() to be able to shrink dirty maps, >>> also rename it to dirty_memory_resize(). And fix the leak by >>> shrinking dirty memory maps on qemu_ram_free() if needed. >>> >>> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") >>> Cc: qemu-stable@nongnu.org >>> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> >> >> I looked at this a while ago and I think the problem is more involved, >> because we might actually generate holes for which we can free the >> bitmap. I think this patch impoves the situation, though. >> >> >> IIRC if you hotplug two dimms and then hotunplug only the latter, the > > I assume you meant "former"? :) I remember it would have to be the one "plugged first" :) > >> bitmap for the first dimm will remain as long as the second dimm isn't >> hotunplugged. > > IMHO it's fine to keep the dirty block for the unplugged hole. It'll be > better if we could free it, but we can fix the memory leak first which > seems to be more severe. The dirty memory isn't extremely large (32K ratio > to mem size) if just to be kept idle, but frequent plug/unplug will leak > infinite host mem. Oh, I see, thanks for clarifying. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak 2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin 2022-03-30 19:25 ` Peter Xu 2022-03-31 8:37 ` David Hildenbrand @ 2022-03-31 17:29 ` Peter Xu 2 siblings, 0 replies; 10+ messages in thread From: Peter Xu @ 2022-03-31 17:29 UTC (permalink / raw) To: Andrey Ryabinin Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel, Philippe Mathieu-Daudé, yc-core, Paolo Bonzini On Fri, Mar 25, 2022 at 06:40:13PM +0300, Andrey Ryabinin wrote: > The sequence of ram_block_add()/qemu_ram_free()/ram_block_add() > function calls leads to leaking some memory. > > ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks > for new memory. These blocks only grow but never shrink. So the > qemu_ram_free() restores RAM size back to it's original stat but > doesn't touch dirty memory bitmaps. > > After qemu_ram_free() there is no way of knowing that we have > allocated dirty memory bitmaps beyond current RAM size. > So the next ram_block_add() will call dirty_memory_extend() again to > to allocate new bitmaps and rewrite pointers to bitmaps left after > the first ram_block_add()/dirty_memory_extend() calls. > > Rework dirty_memory_extend() to be able to shrink dirty maps, > also rename it to dirty_memory_resize(). And fix the leak by > shrinking dirty memory maps on qemu_ram_free() if needed. > > Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug") > Cc: qemu-stable@nongnu.org > Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> > --- > include/exec/ramlist.h | 2 ++ > softmmu/physmem.c | 38 ++++++++++++++++++++++++++++++++------ > 2 files changed, 34 insertions(+), 6 deletions(-) > > diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h > index 2ad2a81acc..019e238e7c 100644 > --- a/include/exec/ramlist.h > +++ b/include/exec/ramlist.h > @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier; > #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8) > typedef struct { > struct rcu_head rcu; > + unsigned int nr_blocks; Nit: How about renaming it to nr_blocks_allocated? It's much harder to identify this with the _inuse below otherwise.. It'll be great if there're comments explaining the two fields. > + unsigned int nr_blocks_inuse; If there'll be comment, we should definitely mark out that this variable is only set when a new array will be replacing this one. IOW, this field is not valid during most lifecycle of this structure, iiuc. And that's very not obvious.. > unsigned long *blocks[]; > } DirtyMemoryBlocks; > > diff --git a/softmmu/physmem.c b/softmmu/physmem.c > index 32f76362bf..073ab37351 100644 > --- a/softmmu/physmem.c > +++ b/softmmu/physmem.c > @@ -1919,8 +1919,23 @@ void qemu_ram_msync(RAMBlock *block, ram_addr_t start, ram_addr_t length) > } > } > > +static void dirty_memory_free(DirtyMemoryBlocks *blocks) > +{ > + int i; > + > + /* > + *'nr_blocks_inuse' is more than nr_blocks (memory was extended) > + * or it's less than 'nr_blocks' (memory shrunk). In the second case > + * we free all the blocks above the nr_blocks_inuse. > + */ > + for (i = blocks->nr_blocks_inuse; i < blocks->nr_blocks; i++) { > + g_free(blocks->blocks[i]); > + } > + g_free(blocks); > +} > + > /* Called with ram_list.mutex held */ > -static void dirty_memory_extend(ram_addr_t old_ram_size, > +static void dirty_memory_resize(ram_addr_t old_ram_size, > ram_addr_t new_ram_size) > { > ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size, > @@ -1929,25 +1944,28 @@ static void dirty_memory_extend(ram_addr_t old_ram_size, > DIRTY_MEMORY_BLOCK_SIZE); > int i; > > - /* Only need to extend if block count increased */ > - if (new_num_blocks <= old_num_blocks) { > + /* Only need to resize if block count changed */ > + if (new_num_blocks == old_num_blocks) { > return; > } > > for (i = 0; i < DIRTY_MEMORY_NUM; i++) { > DirtyMemoryBlocks *old_blocks; > DirtyMemoryBlocks *new_blocks; > + unsigned int num_blocks = MAX(old_num_blocks, new_num_blocks); > int j; > > old_blocks = qatomic_rcu_read(&ram_list.dirty_memory[i]); > new_blocks = g_malloc(sizeof(*new_blocks) + > - sizeof(new_blocks->blocks[0]) * new_num_blocks); > + sizeof(new_blocks->blocks[0]) * num_blocks); > + new_blocks->nr_blocks = new_num_blocks; Here new_num_blocks is passed to nr_blocks, however the allocation is with size max(old, new). Shouldn't it still be new_num_blocks? > > if (old_num_blocks) { > memcpy(new_blocks->blocks, old_blocks->blocks, > old_num_blocks * sizeof(old_blocks->blocks[0])); Here we copied over all old pointers even if old>new.. If we allocate the new array with new_num_blocks entries only, shouldn't we copy min(old, new) here instead? Thanks, > } > > + /* memory extend case (new>old): allocate new blocks*/ > for (j = old_num_blocks; j < new_num_blocks; j++) { > new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE); > } > @@ -1955,7 +1973,8 @@ static void dirty_memory_extend(ram_addr_t old_ram_size, > qatomic_rcu_set(&ram_list.dirty_memory[i], new_blocks); > > if (old_blocks) { > - g_free_rcu(old_blocks, rcu); > + old_blocks->nr_blocks_inuse = new_num_blocks; > + call_rcu(old_blocks, dirty_memory_free, rcu); > } > } > } > @@ -2001,7 +2020,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp) > new_ram_size = MAX(old_ram_size, > (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS); > if (new_ram_size > old_ram_size) { > - dirty_memory_extend(old_ram_size, new_ram_size); > + dirty_memory_resize(old_ram_size, new_ram_size); > } > /* Keep the list sorted from biggest to smallest block. Unlike QTAILQ, > * QLIST (which has an RCU-friendly variant) does not have insertion at > @@ -2218,6 +2237,8 @@ static void reclaim_ramblock(RAMBlock *block) > > void qemu_ram_free(RAMBlock *block) > { > + ram_addr_t old_ram_size, new_ram_size; > + > if (!block) { > return; > } > @@ -2228,12 +2249,17 @@ void qemu_ram_free(RAMBlock *block) > } > > qemu_mutex_lock_ramlist(); > + old_ram_size = last_ram_page(); > + > QLIST_REMOVE_RCU(block, next); > ram_list.mru_block = NULL; > /* Write list before version */ > smp_wmb(); > ram_list.version++; > call_rcu(block, reclaim_ramblock, rcu); > + > + new_ram_size = last_ram_page(); > + dirty_memory_resize(old_ram_size, new_ram_size); > qemu_mutex_unlock_ramlist(); > } > > -- > 2.34.1 > -- Peter Xu ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() 2022-03-25 15:40 [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Andrey Ryabinin 2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin @ 2022-03-30 18:47 ` Peter Xu 1 sibling, 0 replies; 10+ messages in thread From: Peter Xu @ 2022-03-30 18:47 UTC (permalink / raw) To: Andrey Ryabinin Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel, Philippe Mathieu-Daudé, yc-core, Paolo Bonzini On Fri, Mar 25, 2022 at 06:40:12PM +0300, Andrey Ryabinin wrote: > The 'ram_list.blocks' modifications protected by 'ram_list.mutex'. > last_ram_page() uses state of 'ram_list.blocks' to identify ram's size. > ram_block_add() calls last_ram_page() before the mutex lock > making the following race possible: > > CPU#0 CPU#1 > ram_block_add() > old_ram_size = last_ram_page() > qemu_mutex_lock_ramlist() > ... > dirty_memory_extend(old_ram_size, new_ram_size); > ram_block_add() > old_ram_size = last_ram_page() > > //insert block to ram_list > QLIST_INSERT_*_RCU() > qemu_mutex_unlock_ramlist() > qemu_mutex_lock_ramlist() > .... > dirty_memory_extend(old_ram_size, new_ram_size); > > Such race may result in leaking some dirty memory bitmaps. > > Because of stale 'old_ram_size' value, the dirty_memory_extend() on CPU#0 > will allocate and reinitialize some of the already allocated on CPU#1 > dirty memory bitmap blocks. > > Fix this by moving last_ram_page() call under the qemu_mutex_lock_ramlist() > > Cc: qemu-stable@nongnu.org > Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Peter Xu ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-03-31 17:36 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-25 15:40 [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Andrey Ryabinin 2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin 2022-03-30 19:25 ` Peter Xu 2022-03-31 16:14 ` Andrey Ryabinin 2022-03-31 17:34 ` Peter Xu 2022-03-31 8:37 ` David Hildenbrand 2022-03-31 12:27 ` Peter Xu 2022-03-31 16:26 ` David Hildenbrand 2022-03-31 17:29 ` Peter Xu 2022-03-30 18:47 ` [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Peter Xu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.