Re: [PATCH v4 3/6] KVM: arm64: Enable ring-based dirty memory tracking

From: Gavin Shan <gshan@redhat.com>
To: Peter Xu <peterx@redhat.com>, Marc Zyngier <maz@kernel.org>
Cc: kvm@vger.kernel.org, catalin.marinas@arm.com,
	andrew.jones@linux.dev, will@kernel.org, shan.gavin@gmail.com,
	bgardon@google.com, dmatlack@google.com, pbonzini@redhat.com,
	zhenyzha@redhat.com, shuah@kernel.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v4 3/6] KVM: arm64: Enable ring-based dirty memory tracking
Date: Thu, 29 Sep 2022 21:31:34 +1000	[thread overview]
Message-ID: <ddc4166c-81b6-2f7b-87a7-4af3d7db888a@redhat.com> (raw)
In-Reply-To: <d0beb9bd-5295-adb6-a473-c131d6102947@redhat.com>

Hi Peter and Marc,

On 9/29/22 7:50 PM, Gavin Shan wrote:
> On 9/29/22 12:52 AM, Peter Xu wrote:
>> On Wed, Sep 28, 2022 at 09:25:34AM +0100, Marc Zyngier wrote:
>>> On Wed, 28 Sep 2022 00:47:43 +0100,
>>> Gavin Shan <gshan@redhat.com> wrote:
>>>
>>>> I have rough idea as below. It's appreciated if you can comment before I'm
>>>> going a head for the prototype. The overall idea is to introduce another
>>>> dirty ring for KVM (kvm-dirty-ring). It's updated and visited separately
>>>> to dirty ring for vcpu (vcpu-dirty-ring).
>>>>
>>>>     - When the various VGIC/ITS table base addresses are specified, kvm-dirty-ring
>>>>       entries are added to mark those pages as 'always-dirty'. In mark_page_dirty_in_slot(),
>>>>       those 'always-dirty' pages will be skipped, no entries pushed to vcpu-dirty-ring.
>>>>
>>>>     - Similar to vcpu-dirty-ring, kvm-dirty-ring is accessed from userspace through
>>>>       mmap(kvm->fd). However, there won't have similar reset interface. It means
>>>>       'struct kvm_dirty_gfn::flags' won't track any information as we do for
>>>>       vcpu-dirty-ring. In this regard, kvm-dirty-ring is purely shared buffer to
>>>>       advertise 'always-dirty' pages from host to userspace.
>>>>          - For QEMU, shutdown/suspend/resume cases won't be concerning
>>>> us any more. The
>>>>       only concerned case is migration. When the migration is about to complete,
>>>>       kvm-dirty-ring entries are fetched and the dirty bits are updated to global
>>>>       dirty page bitmap and RAMBlock's dirty page bitmap. For this, I'm still reading
>>>>       the code to find the best spot to do it.
>>>
>>> I think it makes a lot of sense to have a way to log writes that are
>>> not generated by a vpcu, such as the GIC and maybe other things in the
>>> future, such as DMA traffic (some SMMUs are able to track dirty pages
>>> as well).
>>>
>>> However, I don't really see the point in inventing a new mechanism for
>>> that. Why don't we simply allow non-vpcu dirty pages to be tracked in
>>> the dirty *bitmap*?
>>>
>>>  From a kernel perspective, this is dead easy:
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 5b064dbadaf4..ae9138f29d51 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -3305,7 +3305,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>       struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>>> -    if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>>> +    if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>>           return;
>>>   #endif
>>> @@ -3313,10 +3313,11 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>           unsigned long rel_gfn = gfn - memslot->base_gfn;
>>>           u32 slot = (memslot->as_id << 16) | memslot->id;
>>> -        if (kvm->dirty_ring_size)
>>> +        if (vpcu && kvm->dirty_ring_size)
>>>               kvm_dirty_ring_push(&vcpu->dirty_ring,
>>>                           slot, rel_gfn);
>>> -        else
>>> +        /* non-vpcu dirtying ends up in the global bitmap */
>>> +        if (!vcpu && memslot->dirty_bitmap)
>>>               set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>       }
>>>   }
>>>
>>> though I'm sure there is a few more things to it.
>>
>> Yes, currently the bitmaps are not created when rings are enabled.
>> kvm_prepare_memory_region() has:
>>
>>         else if (!kvm->dirty_ring_size) {
>>             r = kvm_alloc_dirty_bitmap(new);
>>
>> But I think maybe that's a solution worth considering.  Using the rings
>> have a major challenge on the limitation of ring size, so that for e.g. an
>> ioctl we need to make sure the pages to dirty within an ioctl procedure
>> will not be more than the ring can take.  Using dirty bitmap for a last
>> phase sync of constant (but still very small amount of) dirty pages does
>> sound reasonable and can avoid that complexity.  The payoff is we'll need
>> to allocate both the rings and the bitmaps.
>>
> 
> Ok. I was thinking of using the bitmap to convey the dirty pages for
> this particular case, where we don't have running vcpu. The concern I had
> is the natural difference between a ring and bitmap. The ring-buffer is
> discrete, comparing to bitmap. Besides, it sounds a little strange to
> have two different sets of meta-data to track the data (dirty pages).
> 
> However, bitmap is easier way than per-vm ring. The constrains with
> per-vm ring is just as Peter pointed. So lets reuse the bitmap to
> convey the dirty pages for this particular case. I think the payoff,
> extra bitmap, is acceptable. For this, we need another capability
> (KVM_CAP_DIRTY_LOG_RING_BITMAP?) so that QEMU can collects the dirty
> bitmap in the last phase of migration.
> 
> If all of us agree on this, I can send another kernel patch to address
> this. QEMU still need more patches so that the feature can be supported.
> 

I've had the following PATCH[v5 3/7] to reuse bitmap for these particular
cases. KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls are used to visit
the bitmap. The new capability is advertised by KVM_CAP_DIRTY_LOG_RING_BITMAP.
Note those two ioctls are disabled when dirty-ring is enabled, we need to
enable them accordingly.

    PATCH[v5 3/7] KVM: x86: Use bitmap in ring-based dirty page tracking

I would like to post v5 after someone reviews or acks kvm/selftests part
of this series.

[...]

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm