All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist()
@ 2022-03-25 15:40 Andrey Ryabinin
  2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin
  2022-03-30 18:47 ` [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Peter Xu
  0 siblings, 2 replies; 10+ messages in thread
From: Andrey Ryabinin @ 2022-03-25 15:40 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Xu, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Stefan Hajnoczi
  Cc: Andrey Ryabinin, qemu-devel, yc-core, qemu-stable

The 'ram_list.blocks' modifications protected by 'ram_list.mutex'.
last_ram_page() uses state of 'ram_list.blocks' to identify ram's size.
ram_block_add() calls last_ram_page() before the mutex lock
making the following race possible:

     CPU#0                                       CPU#1
                                      ram_block_add()
                                         old_ram_size = last_ram_page()
                                         qemu_mutex_lock_ramlist()
                                         ...
                                         dirty_memory_extend(old_ram_size, new_ram_size);
ram_block_add()
   old_ram_size = last_ram_page()

					      //insert block to ram_list
					      QLIST_INSERT_*_RCU()
					      qemu_mutex_unlock_ramlist()
   qemu_mutex_lock_ramlist()
   ....
   dirty_memory_extend(old_ram_size, new_ram_size);

Such race may result in leaking some dirty memory bitmaps.

Because of stale 'old_ram_size' value, the dirty_memory_extend() on CPU#0
will allocate and reinitialize some of the already allocated on CPU#1
dirty memory bitmap blocks.

Fix this by moving last_ram_page() call under the qemu_mutex_lock_ramlist()

Cc: qemu-stable@nongnu.org
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
 softmmu/physmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 4e1b27a20e..32f76362bf 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1969,9 +1969,9 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
     ram_addr_t old_ram_size, new_ram_size;
     Error *err = NULL;
 
+    qemu_mutex_lock_ramlist();
     old_ram_size = last_ram_page();
 
-    qemu_mutex_lock_ramlist();
     new_block->offset = find_ram_offset(new_block->max_length);
 
     if (!new_block->host) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak
  2022-03-25 15:40 [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Andrey Ryabinin
@ 2022-03-25 15:40 ` Andrey Ryabinin
  2022-03-30 19:25   ` Peter Xu
                     ` (2 more replies)
  2022-03-30 18:47 ` [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Peter Xu
  1 sibling, 3 replies; 10+ messages in thread
From: Andrey Ryabinin @ 2022-03-25 15:40 UTC (permalink / raw)
  To: Paolo Bonzini, Peter Xu, David Hildenbrand,
	Philippe Mathieu-Daudé,
	Stefan Hajnoczi
  Cc: Andrey Ryabinin, qemu-devel, yc-core, qemu-stable

The sequence of ram_block_add()/qemu_ram_free()/ram_block_add()
function calls leads to leaking some memory.

ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks
for new memory. These blocks only grow but never shrink. So the
qemu_ram_free() restores RAM size back to it's original stat but
doesn't touch dirty memory bitmaps.

After qemu_ram_free() there is no way of knowing that we have
allocated dirty memory bitmaps beyond current RAM size.
So the next ram_block_add() will call dirty_memory_extend() again to
to allocate new bitmaps and rewrite pointers to bitmaps left after
the first ram_block_add()/dirty_memory_extend() calls.

Rework dirty_memory_extend() to be able to shrink dirty maps,
also rename it to dirty_memory_resize(). And fix the leak by
shrinking dirty memory maps on qemu_ram_free() if needed.

Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
Cc: qemu-stable@nongnu.org
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
 include/exec/ramlist.h |  2 ++
 softmmu/physmem.c      | 38 ++++++++++++++++++++++++++++++++------
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
index 2ad2a81acc..019e238e7c 100644
--- a/include/exec/ramlist.h
+++ b/include/exec/ramlist.h
@@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier;
 #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
 typedef struct {
     struct rcu_head rcu;
+    unsigned int nr_blocks;
+    unsigned int nr_blocks_inuse;
     unsigned long *blocks[];
 } DirtyMemoryBlocks;
 
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 32f76362bf..073ab37351 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -1919,8 +1919,23 @@ void qemu_ram_msync(RAMBlock *block, ram_addr_t start, ram_addr_t length)
     }
 }
 
+static void dirty_memory_free(DirtyMemoryBlocks *blocks)
+{
+    int i;
+
+    /*
+     *'nr_blocks_inuse' is more than nr_blocks (memory was extended)
+     * or it's less than 'nr_blocks' (memory shrunk). In the second case
+     * we free all the blocks above the nr_blocks_inuse.
+     */
+    for (i = blocks->nr_blocks_inuse; i < blocks->nr_blocks; i++) {
+        g_free(blocks->blocks[i]);
+    }
+    g_free(blocks);
+}
+
 /* Called with ram_list.mutex held */
-static void dirty_memory_extend(ram_addr_t old_ram_size,
+static void dirty_memory_resize(ram_addr_t old_ram_size,
                                 ram_addr_t new_ram_size)
 {
     ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
@@ -1929,25 +1944,28 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
                                              DIRTY_MEMORY_BLOCK_SIZE);
     int i;
 
-    /* Only need to extend if block count increased */
-    if (new_num_blocks <= old_num_blocks) {
+    /* Only need to resize if block count changed */
+    if (new_num_blocks == old_num_blocks) {
         return;
     }
 
     for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
         DirtyMemoryBlocks *old_blocks;
         DirtyMemoryBlocks *new_blocks;
+        unsigned int num_blocks = MAX(old_num_blocks, new_num_blocks);
         int j;
 
         old_blocks = qatomic_rcu_read(&ram_list.dirty_memory[i]);
         new_blocks = g_malloc(sizeof(*new_blocks) +
-                              sizeof(new_blocks->blocks[0]) * new_num_blocks);
+                              sizeof(new_blocks->blocks[0]) * num_blocks);
+        new_blocks->nr_blocks = new_num_blocks;
 
         if (old_num_blocks) {
             memcpy(new_blocks->blocks, old_blocks->blocks,
                    old_num_blocks * sizeof(old_blocks->blocks[0]));
         }
 
+        /* memory extend case (new>old): allocate new blocks*/
         for (j = old_num_blocks; j < new_num_blocks; j++) {
             new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE);
         }
@@ -1955,7 +1973,8 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
         qatomic_rcu_set(&ram_list.dirty_memory[i], new_blocks);
 
         if (old_blocks) {
-            g_free_rcu(old_blocks, rcu);
+            old_blocks->nr_blocks_inuse = new_num_blocks;
+            call_rcu(old_blocks, dirty_memory_free, rcu);
         }
     }
 }
@@ -2001,7 +2020,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
     new_ram_size = MAX(old_ram_size,
               (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
     if (new_ram_size > old_ram_size) {
-        dirty_memory_extend(old_ram_size, new_ram_size);
+        dirty_memory_resize(old_ram_size, new_ram_size);
     }
     /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
      * QLIST (which has an RCU-friendly variant) does not have insertion at
@@ -2218,6 +2237,8 @@ static void reclaim_ramblock(RAMBlock *block)
 
 void qemu_ram_free(RAMBlock *block)
 {
+    ram_addr_t old_ram_size, new_ram_size;
+
     if (!block) {
         return;
     }
@@ -2228,12 +2249,17 @@ void qemu_ram_free(RAMBlock *block)
     }
 
     qemu_mutex_lock_ramlist();
+    old_ram_size = last_ram_page();
+
     QLIST_REMOVE_RCU(block, next);
     ram_list.mru_block = NULL;
     /* Write list before version */
     smp_wmb();
     ram_list.version++;
     call_rcu(block, reclaim_ramblock, rcu);
+
+    new_ram_size = last_ram_page();
+    dirty_memory_resize(old_ram_size, new_ram_size);
     qemu_mutex_unlock_ramlist();
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist()
  2022-03-25 15:40 [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Andrey Ryabinin
  2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin
@ 2022-03-30 18:47 ` Peter Xu
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-03-30 18:47 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel,
	Philippe Mathieu-Daudé,
	yc-core, Paolo Bonzini

On Fri, Mar 25, 2022 at 06:40:12PM +0300, Andrey Ryabinin wrote:
> The 'ram_list.blocks' modifications protected by 'ram_list.mutex'.
> last_ram_page() uses state of 'ram_list.blocks' to identify ram's size.
> ram_block_add() calls last_ram_page() before the mutex lock
> making the following race possible:
> 
>      CPU#0                                       CPU#1
>                                       ram_block_add()
>                                          old_ram_size = last_ram_page()
>                                          qemu_mutex_lock_ramlist()
>                                          ...
>                                          dirty_memory_extend(old_ram_size, new_ram_size);
> ram_block_add()
>    old_ram_size = last_ram_page()
> 
> 					      //insert block to ram_list
> 					      QLIST_INSERT_*_RCU()
> 					      qemu_mutex_unlock_ramlist()
>    qemu_mutex_lock_ramlist()
>    ....
>    dirty_memory_extend(old_ram_size, new_ram_size);
> 
> Such race may result in leaking some dirty memory bitmaps.
> 
> Because of stale 'old_ram_size' value, the dirty_memory_extend() on CPU#0
> will allocate and reinitialize some of the already allocated on CPU#1
> dirty memory bitmap blocks.
> 
> Fix this by moving last_ram_page() call under the qemu_mutex_lock_ramlist()
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak
  2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin
@ 2022-03-30 19:25   ` Peter Xu
  2022-03-31 16:14     ` Andrey Ryabinin
  2022-03-31  8:37   ` David Hildenbrand
  2022-03-31 17:29   ` Peter Xu
  2 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-03-30 19:25 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel,
	Philippe Mathieu-Daudé,
	yc-core, Paolo Bonzini

On Fri, Mar 25, 2022 at 06:40:13PM +0300, Andrey Ryabinin wrote:
> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add()
> function calls leads to leaking some memory.
> 
> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks
> for new memory. These blocks only grow but never shrink. So the
> qemu_ram_free() restores RAM size back to it's original stat but
> doesn't touch dirty memory bitmaps.
> 
> After qemu_ram_free() there is no way of knowing that we have
> allocated dirty memory bitmaps beyond current RAM size.
> So the next ram_block_add() will call dirty_memory_extend() again to
> to allocate new bitmaps and rewrite pointers to bitmaps left after
> the first ram_block_add()/dirty_memory_extend() calls.
> 
> Rework dirty_memory_extend() to be able to shrink dirty maps,
> also rename it to dirty_memory_resize(). And fix the leak by
> shrinking dirty memory maps on qemu_ram_free() if needed.
> 
> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
>  include/exec/ramlist.h |  2 ++
>  softmmu/physmem.c      | 38 ++++++++++++++++++++++++++++++++------
>  2 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
> index 2ad2a81acc..019e238e7c 100644
> --- a/include/exec/ramlist.h
> +++ b/include/exec/ramlist.h
> @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier;
>  #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
>  typedef struct {
>      struct rcu_head rcu;
> +    unsigned int nr_blocks;
> +    unsigned int nr_blocks_inuse;
>      unsigned long *blocks[];
>  } DirtyMemoryBlocks;

The problem looks valid, but can we avoid introducing these variables at
all?

IIUC what we miss here is the proper releasing of dirty blocks when ram is
released.  IMHO as long as we properly release the extra dirty memory
blocks in qemu_ram_free(), then last_ram_page() will reflect the existing
dirty memory block size.  That looks simpler if I'm correct..

Side question: both of the patches are not regression introduced in this
release, right?  So they are targeting for the next release?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak
  2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin
  2022-03-30 19:25   ` Peter Xu
@ 2022-03-31  8:37   ` David Hildenbrand
  2022-03-31 12:27     ` Peter Xu
  2022-03-31 17:29   ` Peter Xu
  2 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2022-03-31  8:37 UTC (permalink / raw)
  To: Andrey Ryabinin, Paolo Bonzini, Peter Xu,
	Philippe Mathieu-Daudé,
	Stefan Hajnoczi
  Cc: qemu-devel, yc-core, qemu-stable

On 25.03.22 16:40, Andrey Ryabinin wrote:
> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add()
> function calls leads to leaking some memory.
> 
> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks
> for new memory. These blocks only grow but never shrink. So the
> qemu_ram_free() restores RAM size back to it's original stat but
> doesn't touch dirty memory bitmaps.
> 
> After qemu_ram_free() there is no way of knowing that we have
> allocated dirty memory bitmaps beyond current RAM size.
> So the next ram_block_add() will call dirty_memory_extend() again to
> to allocate new bitmaps and rewrite pointers to bitmaps left after
> the first ram_block_add()/dirty_memory_extend() calls.
> 
> Rework dirty_memory_extend() to be able to shrink dirty maps,
> also rename it to dirty_memory_resize(). And fix the leak by
> shrinking dirty memory maps on qemu_ram_free() if needed.
> 
> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>

I looked at this a while ago and I think the problem is more involved,
because we might actually generate holes for which we can free the
bitmap. I think this patch impoves the situation, though.


IIRC if you hotplug two dimms and then hotunplug only the latter, the
bitmap for the first dimm will remain as long as the second dimm isn't
hotunplugged.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak
  2022-03-31  8:37   ` David Hildenbrand
@ 2022-03-31 12:27     ` Peter Xu
  2022-03-31 16:26       ` David Hildenbrand
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Xu @ 2022-03-31 12:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Stefan Hajnoczi, qemu-stable, Andrey Ryabinin,
	Philippe Mathieu-Daudé,
	qemu-devel, yc-core, Paolo Bonzini

On Thu, Mar 31, 2022 at 10:37:39AM +0200, David Hildenbrand wrote:
> On 25.03.22 16:40, Andrey Ryabinin wrote:
> > The sequence of ram_block_add()/qemu_ram_free()/ram_block_add()
> > function calls leads to leaking some memory.
> > 
> > ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks
> > for new memory. These blocks only grow but never shrink. So the
> > qemu_ram_free() restores RAM size back to it's original stat but
> > doesn't touch dirty memory bitmaps.
> > 
> > After qemu_ram_free() there is no way of knowing that we have
> > allocated dirty memory bitmaps beyond current RAM size.
> > So the next ram_block_add() will call dirty_memory_extend() again to
> > to allocate new bitmaps and rewrite pointers to bitmaps left after
> > the first ram_block_add()/dirty_memory_extend() calls.
> > 
> > Rework dirty_memory_extend() to be able to shrink dirty maps,
> > also rename it to dirty_memory_resize(). And fix the leak by
> > shrinking dirty memory maps on qemu_ram_free() if needed.
> > 
> > Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
> > Cc: qemu-stable@nongnu.org
> > Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> 
> I looked at this a while ago and I think the problem is more involved,
> because we might actually generate holes for which we can free the
> bitmap. I think this patch impoves the situation, though.
> 
> 
> IIRC if you hotplug two dimms and then hotunplug only the latter, the

I assume you meant "former"? :)

> bitmap for the first dimm will remain as long as the second dimm isn't
> hotunplugged.

IMHO it's fine to keep the dirty block for the unplugged hole.  It'll be
better if we could free it, but we can fix the memory leak first which
seems to be more severe.  The dirty memory isn't extremely large (32K ratio
to mem size) if just to be kept idle, but frequent plug/unplug will leak
infinite host mem.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak
  2022-03-30 19:25   ` Peter Xu
@ 2022-03-31 16:14     ` Andrey Ryabinin
  2022-03-31 17:34       ` Peter Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Andrey Ryabinin @ 2022-03-31 16:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel,
	Philippe Mathieu-Daudé,
	yc-core, Paolo Bonzini



On 3/30/22 22:25, Peter Xu wrote:
> On Fri, Mar 25, 2022 at 06:40:13PM +0300, Andrey Ryabinin wrote:
>> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add()
>> function calls leads to leaking some memory.
>>
>> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks
>> for new memory. These blocks only grow but never shrink. So the
>> qemu_ram_free() restores RAM size back to it's original stat but
>> doesn't touch dirty memory bitmaps.
>>
>> After qemu_ram_free() there is no way of knowing that we have
>> allocated dirty memory bitmaps beyond current RAM size.
>> So the next ram_block_add() will call dirty_memory_extend() again to
>> to allocate new bitmaps and rewrite pointers to bitmaps left after
>> the first ram_block_add()/dirty_memory_extend() calls.
>>
>> Rework dirty_memory_extend() to be able to shrink dirty maps,
>> also rename it to dirty_memory_resize(). And fix the leak by
>> shrinking dirty memory maps on qemu_ram_free() if needed.
>>
>> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>> ---
>>  include/exec/ramlist.h |  2 ++
>>  softmmu/physmem.c      | 38 ++++++++++++++++++++++++++++++++------
>>  2 files changed, 34 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
>> index 2ad2a81acc..019e238e7c 100644
>> --- a/include/exec/ramlist.h
>> +++ b/include/exec/ramlist.h
>> @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier;
>>  #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
>>  typedef struct {
>>      struct rcu_head rcu;
>> +    unsigned int nr_blocks;
>> +    unsigned int nr_blocks_inuse;
>>      unsigned long *blocks[];
>>  } DirtyMemoryBlocks;
> 
> The problem looks valid, but can we avoid introducing these variables at
> all?
> 
> IIUC what we miss here is the proper releasing of dirty blocks when ram is
> released.  IMHO as long as we properly release the extra dirty memory
> blocks in qemu_ram_free(), then last_ram_page() will reflect the existing
> dirty memory block size.  That looks simpler if I'm correct..
> 

ram_list.dirty_memory is RCU protected which means we can't free it immediately.
Freeing must be delayed until the end of RCU grace period.
One way to do it is using rcu callback, like in this patch. That why we need these
variables - to pass the information into rcu callback.

Another way is using synchronize_rcu() before freeing the memory. In that case
the variables won't be needed. But it's expensive.
Also I'm not sure if using synchronize_rcu() under a mutex lock is a good idea.
Perhaps it will needed somehow to rework dirty_memory_resize() to move synchronize_rcu() and freeing
outside of mutex_lock()/unlock() section.

> Side question: both of the patches are not regression introduced in this
> release, right?  So they are targeting for the next release?
> 

Right, these bugs are old. I'm not familiar with qemu release schedule/procedure,
so whatever release is more appropriate.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak
  2022-03-31 12:27     ` Peter Xu
@ 2022-03-31 16:26       ` David Hildenbrand
  0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2022-03-31 16:26 UTC (permalink / raw)
  To: Peter Xu
  Cc: Stefan Hajnoczi, qemu-stable, Andrey Ryabinin,
	Philippe Mathieu-Daudé,
	qemu-devel, yc-core, Paolo Bonzini

On 31.03.22 14:27, Peter Xu wrote:
> On Thu, Mar 31, 2022 at 10:37:39AM +0200, David Hildenbrand wrote:
>> On 25.03.22 16:40, Andrey Ryabinin wrote:
>>> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add()
>>> function calls leads to leaking some memory.
>>>
>>> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks
>>> for new memory. These blocks only grow but never shrink. So the
>>> qemu_ram_free() restores RAM size back to it's original stat but
>>> doesn't touch dirty memory bitmaps.
>>>
>>> After qemu_ram_free() there is no way of knowing that we have
>>> allocated dirty memory bitmaps beyond current RAM size.
>>> So the next ram_block_add() will call dirty_memory_extend() again to
>>> to allocate new bitmaps and rewrite pointers to bitmaps left after
>>> the first ram_block_add()/dirty_memory_extend() calls.
>>>
>>> Rework dirty_memory_extend() to be able to shrink dirty maps,
>>> also rename it to dirty_memory_resize(). And fix the leak by
>>> shrinking dirty memory maps on qemu_ram_free() if needed.
>>>
>>> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
>>> Cc: qemu-stable@nongnu.org
>>> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>>
>> I looked at this a while ago and I think the problem is more involved,
>> because we might actually generate holes for which we can free the
>> bitmap. I think this patch impoves the situation, though.
>>
>>
>> IIRC if you hotplug two dimms and then hotunplug only the latter, the
> 
> I assume you meant "former"? :)

I remember it would have to be the one "plugged first" :)

> 
>> bitmap for the first dimm will remain as long as the second dimm isn't
>> hotunplugged.
> 
> IMHO it's fine to keep the dirty block for the unplugged hole.  It'll be
> better if we could free it, but we can fix the memory leak first which
> seems to be more severe.  The dirty memory isn't extremely large (32K ratio
> to mem size) if just to be kept idle, but frequent plug/unplug will leak
> infinite host mem.

Oh, I see, thanks for clarifying.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak
  2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin
  2022-03-30 19:25   ` Peter Xu
  2022-03-31  8:37   ` David Hildenbrand
@ 2022-03-31 17:29   ` Peter Xu
  2 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-03-31 17:29 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel,
	Philippe Mathieu-Daudé,
	yc-core, Paolo Bonzini

On Fri, Mar 25, 2022 at 06:40:13PM +0300, Andrey Ryabinin wrote:
> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add()
> function calls leads to leaking some memory.
> 
> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks
> for new memory. These blocks only grow but never shrink. So the
> qemu_ram_free() restores RAM size back to it's original stat but
> doesn't touch dirty memory bitmaps.
> 
> After qemu_ram_free() there is no way of knowing that we have
> allocated dirty memory bitmaps beyond current RAM size.
> So the next ram_block_add() will call dirty_memory_extend() again to
> to allocate new bitmaps and rewrite pointers to bitmaps left after
> the first ram_block_add()/dirty_memory_extend() calls.
> 
> Rework dirty_memory_extend() to be able to shrink dirty maps,
> also rename it to dirty_memory_resize(). And fix the leak by
> shrinking dirty memory maps on qemu_ram_free() if needed.
> 
> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
>  include/exec/ramlist.h |  2 ++
>  softmmu/physmem.c      | 38 ++++++++++++++++++++++++++++++++------
>  2 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
> index 2ad2a81acc..019e238e7c 100644
> --- a/include/exec/ramlist.h
> +++ b/include/exec/ramlist.h
> @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier;
>  #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
>  typedef struct {
>      struct rcu_head rcu;
> +    unsigned int nr_blocks;

Nit: How about renaming it to nr_blocks_allocated?  It's much harder to
identify this with the _inuse below otherwise..

It'll be great if there're comments explaining the two fields.

> +    unsigned int nr_blocks_inuse;

If there'll be comment, we should definitely mark out that this variable is
only set when a new array will be replacing this one.  IOW, this field is
not valid during most lifecycle of this structure, iiuc.  And that's very
not obvious..

>      unsigned long *blocks[];
>  } DirtyMemoryBlocks;
>  
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 32f76362bf..073ab37351 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1919,8 +1919,23 @@ void qemu_ram_msync(RAMBlock *block, ram_addr_t start, ram_addr_t length)
>      }
>  }
>  
> +static void dirty_memory_free(DirtyMemoryBlocks *blocks)
> +{
> +    int i;
> +
> +    /*
> +     *'nr_blocks_inuse' is more than nr_blocks (memory was extended)
> +     * or it's less than 'nr_blocks' (memory shrunk). In the second case
> +     * we free all the blocks above the nr_blocks_inuse.
> +     */
> +    for (i = blocks->nr_blocks_inuse; i < blocks->nr_blocks; i++) {
> +        g_free(blocks->blocks[i]);
> +    }
> +    g_free(blocks);
> +}
> +
>  /* Called with ram_list.mutex held */
> -static void dirty_memory_extend(ram_addr_t old_ram_size,
> +static void dirty_memory_resize(ram_addr_t old_ram_size,
>                                  ram_addr_t new_ram_size)
>  {
>      ram_addr_t old_num_blocks = DIV_ROUND_UP(old_ram_size,
> @@ -1929,25 +1944,28 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
>                                               DIRTY_MEMORY_BLOCK_SIZE);
>      int i;
>  
> -    /* Only need to extend if block count increased */
> -    if (new_num_blocks <= old_num_blocks) {
> +    /* Only need to resize if block count changed */
> +    if (new_num_blocks == old_num_blocks) {
>          return;
>      }
>  
>      for (i = 0; i < DIRTY_MEMORY_NUM; i++) {
>          DirtyMemoryBlocks *old_blocks;
>          DirtyMemoryBlocks *new_blocks;
> +        unsigned int num_blocks = MAX(old_num_blocks, new_num_blocks);
>          int j;
>  
>          old_blocks = qatomic_rcu_read(&ram_list.dirty_memory[i]);
>          new_blocks = g_malloc(sizeof(*new_blocks) +
> -                              sizeof(new_blocks->blocks[0]) * new_num_blocks);
> +                              sizeof(new_blocks->blocks[0]) * num_blocks);
> +        new_blocks->nr_blocks = new_num_blocks;

Here new_num_blocks is passed to nr_blocks, however the allocation is with
size max(old, new).  Shouldn't it still be new_num_blocks?

>  
>          if (old_num_blocks) {
>              memcpy(new_blocks->blocks, old_blocks->blocks,
>                     old_num_blocks * sizeof(old_blocks->blocks[0]));

Here we copied over all old pointers even if old>new..

If we allocate the new array with new_num_blocks entries only, shouldn't we
copy min(old, new) here instead?

Thanks,

>          }
>  
> +        /* memory extend case (new>old): allocate new blocks*/
>          for (j = old_num_blocks; j < new_num_blocks; j++) {
>              new_blocks->blocks[j] = bitmap_new(DIRTY_MEMORY_BLOCK_SIZE);
>          }
> @@ -1955,7 +1973,8 @@ static void dirty_memory_extend(ram_addr_t old_ram_size,
>          qatomic_rcu_set(&ram_list.dirty_memory[i], new_blocks);
>  
>          if (old_blocks) {
> -            g_free_rcu(old_blocks, rcu);
> +            old_blocks->nr_blocks_inuse = new_num_blocks;
> +            call_rcu(old_blocks, dirty_memory_free, rcu);
>          }
>      }
>  }
> @@ -2001,7 +2020,7 @@ static void ram_block_add(RAMBlock *new_block, Error **errp)
>      new_ram_size = MAX(old_ram_size,
>                (new_block->offset + new_block->max_length) >> TARGET_PAGE_BITS);
>      if (new_ram_size > old_ram_size) {
> -        dirty_memory_extend(old_ram_size, new_ram_size);
> +        dirty_memory_resize(old_ram_size, new_ram_size);
>      }
>      /* Keep the list sorted from biggest to smallest block.  Unlike QTAILQ,
>       * QLIST (which has an RCU-friendly variant) does not have insertion at
> @@ -2218,6 +2237,8 @@ static void reclaim_ramblock(RAMBlock *block)
>  
>  void qemu_ram_free(RAMBlock *block)
>  {
> +    ram_addr_t old_ram_size, new_ram_size;
> +
>      if (!block) {
>          return;
>      }
> @@ -2228,12 +2249,17 @@ void qemu_ram_free(RAMBlock *block)
>      }
>  
>      qemu_mutex_lock_ramlist();
> +    old_ram_size = last_ram_page();
> +
>      QLIST_REMOVE_RCU(block, next);
>      ram_list.mru_block = NULL;
>      /* Write list before version */
>      smp_wmb();
>      ram_list.version++;
>      call_rcu(block, reclaim_ramblock, rcu);
> +
> +    new_ram_size = last_ram_page();
> +    dirty_memory_resize(old_ram_size, new_ram_size);
>      qemu_mutex_unlock_ramlist();
>  }
>  
> -- 
> 2.34.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak
  2022-03-31 16:14     ` Andrey Ryabinin
@ 2022-03-31 17:34       ` Peter Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Xu @ 2022-03-31 17:34 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Stefan Hajnoczi, David Hildenbrand, qemu-stable, qemu-devel,
	Philippe Mathieu-Daudé,
	yc-core, Paolo Bonzini

On Thu, Mar 31, 2022 at 07:14:12PM +0300, Andrey Ryabinin wrote:
> 
> 
> On 3/30/22 22:25, Peter Xu wrote:
> > On Fri, Mar 25, 2022 at 06:40:13PM +0300, Andrey Ryabinin wrote:
> >> The sequence of ram_block_add()/qemu_ram_free()/ram_block_add()
> >> function calls leads to leaking some memory.
> >>
> >> ram_block_add() calls dirty_memory_extend() to allocate bitmap blocks
> >> for new memory. These blocks only grow but never shrink. So the
> >> qemu_ram_free() restores RAM size back to it's original stat but
> >> doesn't touch dirty memory bitmaps.
> >>
> >> After qemu_ram_free() there is no way of knowing that we have
> >> allocated dirty memory bitmaps beyond current RAM size.
> >> So the next ram_block_add() will call dirty_memory_extend() again to
> >> to allocate new bitmaps and rewrite pointers to bitmaps left after
> >> the first ram_block_add()/dirty_memory_extend() calls.
> >>
> >> Rework dirty_memory_extend() to be able to shrink dirty maps,
> >> also rename it to dirty_memory_resize(). And fix the leak by
> >> shrinking dirty memory maps on qemu_ram_free() if needed.
> >>
> >> Fixes: 5b82b703b69a ("memory: RCU ram_list.dirty_memory[] for safe RAM hotplug")
> >> Cc: qemu-stable@nongnu.org
> >> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> >> ---
> >>  include/exec/ramlist.h |  2 ++
> >>  softmmu/physmem.c      | 38 ++++++++++++++++++++++++++++++++------
> >>  2 files changed, 34 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/include/exec/ramlist.h b/include/exec/ramlist.h
> >> index 2ad2a81acc..019e238e7c 100644
> >> --- a/include/exec/ramlist.h
> >> +++ b/include/exec/ramlist.h
> >> @@ -41,6 +41,8 @@ typedef struct RAMBlockNotifier RAMBlockNotifier;
> >>  #define DIRTY_MEMORY_BLOCK_SIZE ((ram_addr_t)256 * 1024 * 8)
> >>  typedef struct {
> >>      struct rcu_head rcu;
> >> +    unsigned int nr_blocks;
> >> +    unsigned int nr_blocks_inuse;
> >>      unsigned long *blocks[];
> >>  } DirtyMemoryBlocks;
> > 
> > The problem looks valid, but can we avoid introducing these variables at
> > all?
> > 
> > IIUC what we miss here is the proper releasing of dirty blocks when ram is
> > released.  IMHO as long as we properly release the extra dirty memory
> > blocks in qemu_ram_free(), then last_ram_page() will reflect the existing
> > dirty memory block size.  That looks simpler if I'm correct..
> > 
> 
> ram_list.dirty_memory is RCU protected which means we can't free it immediately.
> Freeing must be delayed until the end of RCU grace period.
> One way to do it is using rcu callback, like in this patch. That why we need these
> variables - to pass the information into rcu callback.
> 
> Another way is using synchronize_rcu() before freeing the memory. In that case
> the variables won't be needed. But it's expensive.
> Also I'm not sure if using synchronize_rcu() under a mutex lock is a good idea.
> Perhaps it will needed somehow to rework dirty_memory_resize() to move synchronize_rcu() and freeing
> outside of mutex_lock()/unlock() section.

I suggested that just because from the 1st glance I failed to read clearly
on the current patch due to some trivial details (I commented inline),
which made me thought that merging extend/shrink is not necessary.  But I
agree it looks mostly correct, and call_rcu() should indeed be preferred.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-03-31 17:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-25 15:40 [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Andrey Ryabinin
2022-03-25 15:40 ` [PATCH 2/2] softmmu/physmem: fix dirty memory bitmap memleak Andrey Ryabinin
2022-03-30 19:25   ` Peter Xu
2022-03-31 16:14     ` Andrey Ryabinin
2022-03-31 17:34       ` Peter Xu
2022-03-31  8:37   ` David Hildenbrand
2022-03-31 12:27     ` Peter Xu
2022-03-31 16:26       ` David Hildenbrand
2022-03-31 17:29   ` Peter Xu
2022-03-30 18:47 ` [PATCH 1/2] softmmu/physmem: move last_ram_page() call under qemu_mutex_lock_ramlist() Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.