Re: [PATCH] kvm: Take into account the unaligned section size when preparing bitmap

From: Zenghui Yu <yuzenghui@huawei.com>
To: Peter Xu <peterx@redhat.com>
Cc: pbonzini@redhat.com, qemu-devel@nongnu.org, wanghaibin.wang@huawei.com
Subject: Re: [PATCH] kvm: Take into account the unaligned section size when preparing bitmap
Date: Wed, 9 Dec 2020 10:33:41 +0800	[thread overview]
Message-ID: <bb4bcc8b-1d36-9529-d7cd-4d93162d092f@huawei.com> (raw)
In-Reply-To: <20201208151654.GA6432@xz-x1>

Hi Peter,

Thanks for having a look at it.

On 2020/12/8 23:16, Peter Xu wrote:
> Hi, Zenghui,
> 
> On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote:
>> The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the
>> start and the size of the given range of pages. We have been careful to
>> handle the unaligned cases when performing CLEAR on one slot. But it seems
>> that we forget to take the unaligned *size* case into account when
>> preparing bitmap for the interface, and we may end up clearing dirty status
>> for pages outside of [start, start + size).
> 
> Thanks for the patch, though my understanding is that this is not a bug.
> 
> Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the
> dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the
> value sizeof(unsigned long)).  That exactly covers 64 pages.
> 
> So here as long as start_delta==0 (so the value of "bmap_npages - size / psize"
> won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap
> long enough to cover the range we'd like to clear.

I agree.  But actually I'm not saying that KVMSlot.dirty_bmap is not
long enough.  What I was having in mind is something like:

     // psize = qemu_real_host_page_size;
     // slot.start_addr = 0;
     // slot.memory_size = 64 * psize;

     kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize);   --> [1]
     kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize);  --> [2]

So the @size is not aligned with 64 pages.  Before this patch, we'll
clear dirty status for all pages(0-63) through [1].  It looks to me that
this violates the caller's expectation since we only want to clear
pages(0-31).

As I said, I don't think this will happen in practice -- the migration
code should always provide us with a 64-page aligned section (right?).
I'm just thinking about the correctness of the specific algorithm used
by kvm_log_clear_one_slot().

Or maybe I had missed some other points obvious ;-) ?

Thanks,
Zenghui

> Note that the size of KVMSlot.dirty_bmap can be bigger than the actually size
> of the kvm memslot, however since kvm_memslot_init_dirty_bitmap() initialized
> it to all zero so the extra bits will always be zero for the whole lifecycle of
> the vm/bitmap.
> 
> Thanks,
> 
>>
>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>> ---
>>   accel/kvm/kvm-all.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index bed2455ca5..05d323ba1f 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -747,7 +747,7 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>       assert(bmap_start % BITS_PER_LONG == 0);
>>       /* We should never do log_clear before log_sync */
>>       assert(mem->dirty_bmap);
>> -    if (start_delta) {
>> +    if (start_delta || bmap_npages - size / psize) {
>>           /* Slow path - we need to manipulate a temp bitmap */
>>           bmap_clear = bitmap_new(bmap_npages);
>>           bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap,
>> @@ -760,7 +760,10 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>           bitmap_clear(bmap_clear, 0, start_delta);
>>           d.dirty_bitmap = bmap_clear;
>>       } else {
>> -        /* Fast path - start address aligns well with BITS_PER_LONG */
>> +        /*
>> +         * Fast path - both start and size align well with BITS_PER_LONG
>> +         * (or the end of memory slot)
>> +         */
>>           d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start);
>>       }
>>   
>> -- 
>> 2.19.1
>>
>>
>