kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Peter Xu <peterx@redhat.com>, Marc Zyngier <maz@kernel.org>
Cc: kvm@vger.kernel.org, catalin.marinas@arm.com,
	andrew.jones@linux.dev, will@kernel.org, shan.gavin@gmail.com,
	bgardon@google.com, dmatlack@google.com, pbonzini@redhat.com,
	zhenyzha@redhat.com, shuah@kernel.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v4 3/6] KVM: arm64: Enable ring-based dirty memory tracking
Date: Thu, 29 Sep 2022 21:31:34 +1000	[thread overview]
Message-ID: <ddc4166c-81b6-2f7b-87a7-4af3d7db888a@redhat.com> (raw)
In-Reply-To: <d0beb9bd-5295-adb6-a473-c131d6102947@redhat.com>

Hi Peter and Marc,

On 9/29/22 7:50 PM, Gavin Shan wrote:
> On 9/29/22 12:52 AM, Peter Xu wrote:
>> On Wed, Sep 28, 2022 at 09:25:34AM +0100, Marc Zyngier wrote:
>>> On Wed, 28 Sep 2022 00:47:43 +0100,
>>> Gavin Shan <gshan@redhat.com> wrote:
>>>
>>>> I have rough idea as below. It's appreciated if you can comment before I'm
>>>> going a head for the prototype. The overall idea is to introduce another
>>>> dirty ring for KVM (kvm-dirty-ring). It's updated and visited separately
>>>> to dirty ring for vcpu (vcpu-dirty-ring).
>>>>
>>>>     - When the various VGIC/ITS table base addresses are specified, kvm-dirty-ring
>>>>       entries are added to mark those pages as 'always-dirty'. In mark_page_dirty_in_slot(),
>>>>       those 'always-dirty' pages will be skipped, no entries pushed to vcpu-dirty-ring.
>>>>
>>>>     - Similar to vcpu-dirty-ring, kvm-dirty-ring is accessed from userspace through
>>>>       mmap(kvm->fd). However, there won't have similar reset interface. It means
>>>>       'struct kvm_dirty_gfn::flags' won't track any information as we do for
>>>>       vcpu-dirty-ring. In this regard, kvm-dirty-ring is purely shared buffer to
>>>>       advertise 'always-dirty' pages from host to userspace.
>>>>          - For QEMU, shutdown/suspend/resume cases won't be concerning
>>>> us any more. The
>>>>       only concerned case is migration. When the migration is about to complete,
>>>>       kvm-dirty-ring entries are fetched and the dirty bits are updated to global
>>>>       dirty page bitmap and RAMBlock's dirty page bitmap. For this, I'm still reading
>>>>       the code to find the best spot to do it.
>>>
>>> I think it makes a lot of sense to have a way to log writes that are
>>> not generated by a vpcu, such as the GIC and maybe other things in the
>>> future, such as DMA traffic (some SMMUs are able to track dirty pages
>>> as well).
>>>
>>> However, I don't really see the point in inventing a new mechanism for
>>> that. Why don't we simply allow non-vpcu dirty pages to be tracked in
>>> the dirty *bitmap*?
>>>
>>>  From a kernel perspective, this is dead easy:
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 5b064dbadaf4..ae9138f29d51 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -3305,7 +3305,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>       struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>>> -    if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>>> +    if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>>           return;
>>>   #endif
>>> @@ -3313,10 +3313,11 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>>           unsigned long rel_gfn = gfn - memslot->base_gfn;
>>>           u32 slot = (memslot->as_id << 16) | memslot->id;
>>> -        if (kvm->dirty_ring_size)
>>> +        if (vpcu && kvm->dirty_ring_size)
>>>               kvm_dirty_ring_push(&vcpu->dirty_ring,
>>>                           slot, rel_gfn);
>>> -        else
>>> +        /* non-vpcu dirtying ends up in the global bitmap */
>>> +        if (!vcpu && memslot->dirty_bitmap)
>>>               set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>       }
>>>   }
>>>
>>> though I'm sure there is a few more things to it.
>>
>> Yes, currently the bitmaps are not created when rings are enabled.
>> kvm_prepare_memory_region() has:
>>
>>         else if (!kvm->dirty_ring_size) {
>>             r = kvm_alloc_dirty_bitmap(new);
>>
>> But I think maybe that's a solution worth considering.  Using the rings
>> have a major challenge on the limitation of ring size, so that for e.g. an
>> ioctl we need to make sure the pages to dirty within an ioctl procedure
>> will not be more than the ring can take.  Using dirty bitmap for a last
>> phase sync of constant (but still very small amount of) dirty pages does
>> sound reasonable and can avoid that complexity.  The payoff is we'll need
>> to allocate both the rings and the bitmaps.
>>
> 
> Ok. I was thinking of using the bitmap to convey the dirty pages for
> this particular case, where we don't have running vcpu. The concern I had
> is the natural difference between a ring and bitmap. The ring-buffer is
> discrete, comparing to bitmap. Besides, it sounds a little strange to
> have two different sets of meta-data to track the data (dirty pages).
> 
> However, bitmap is easier way than per-vm ring. The constrains with
> per-vm ring is just as Peter pointed. So lets reuse the bitmap to
> convey the dirty pages for this particular case. I think the payoff,
> extra bitmap, is acceptable. For this, we need another capability
> (KVM_CAP_DIRTY_LOG_RING_BITMAP?) so that QEMU can collects the dirty
> bitmap in the last phase of migration.
> 
> If all of us agree on this, I can send another kernel patch to address
> this. QEMU still need more patches so that the feature can be supported.
> 

I've had the following PATCH[v5 3/7] to reuse bitmap for these particular
cases. KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls are used to visit
the bitmap. The new capability is advertised by KVM_CAP_DIRTY_LOG_RING_BITMAP.
Note those two ioctls are disabled when dirty-ring is enabled, we need to
enable them accordingly.

    PATCH[v5 3/7] KVM: x86: Use bitmap in ring-based dirty page tracking

I would like to post v5 after someone reviews or acks kvm/selftests part
of this series.

[...]

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2022-09-29 11:31 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-27  0:54 [PATCH v4 0/6] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-09-27  0:54 ` [PATCH v4 1/6] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
2022-09-27 10:26   ` Marc Zyngier
2022-09-27 11:31     ` Gavin Shan
2022-09-27 16:00     ` Peter Xu
2022-09-27  0:54 ` [PATCH v4 2/6] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-09-27 16:00   ` Peter Xu
2022-09-27  0:54 ` [PATCH v4 3/6] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-09-27 16:02   ` Peter Xu
2022-09-27 17:32     ` Marc Zyngier
2022-09-27 18:21       ` Peter Xu
2022-09-27 23:47         ` Gavin Shan
2022-09-28  8:25           ` Marc Zyngier
2022-09-28 14:52             ` Peter Xu
2022-09-29  9:50               ` Gavin Shan
2022-09-29 11:31                 ` Gavin Shan [this message]
2022-09-29 14:44                   ` Marc Zyngier
2022-09-29 14:32                 ` Peter Xu
2022-09-30  9:28                   ` Marc Zyngier
2022-09-29 14:42                 ` Marc Zyngier
2022-10-04  4:26                   ` Gavin Shan
2022-10-04  4:26                     ` Gavin Shan
2022-10-04 13:26                     ` Peter Xu
2022-10-04 13:26                       ` Peter Xu
2022-10-04 15:45                     ` Marc Zyngier
2022-10-04 15:45                       ` Marc Zyngier
2022-09-29 14:34               ` Marc Zyngier
2022-09-27  0:54 ` [PATCH v4 4/6] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-09-27  0:54 ` [PATCH v4 5/6] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-09-27  0:54 ` [PATCH v4 6/6] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-09-27 10:30 ` [PATCH v4 0/6] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ddc4166c-81b6-2f7b-87a7-4af3d7db888a@redhat.com \
    --to=gshan@redhat.com \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=shan.gavin@gmail.com \
    --cc=shuah@kernel.org \
    --cc=will@kernel.org \
    --cc=zhenyzha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).