From: Gavin Shan <gshan@redhat.com>
To: Peter Xu <peterx@redhat.com>, Marc Zyngier <maz@kernel.org>
Cc: kvm@vger.kernel.org, catalin.marinas@arm.com,
andrew.jones@linux.dev, will@kernel.org, shan.gavin@gmail.com,
bgardon@google.com, dmatlack@google.com, pbonzini@redhat.com,
zhenyzha@redhat.com, shuah@kernel.org,
kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v4 3/6] KVM: arm64: Enable ring-based dirty memory tracking
Date: Thu, 29 Sep 2022 21:31:34 +1000 [thread overview]
Message-ID: <ddc4166c-81b6-2f7b-87a7-4af3d7db888a@redhat.com> (raw)
In-Reply-To: <d0beb9bd-5295-adb6-a473-c131d6102947@redhat.com>
Hi Peter and Marc,
On 9/29/22 7:50 PM, Gavin Shan wrote:
> On 9/29/22 12:52 AM, Peter Xu wrote:
>> On Wed, Sep 28, 2022 at 09:25:34AM +0100, Marc Zyngier wrote:
>>> On Wed, 28 Sep 2022 00:47:43 +0100,
>>> Gavin Shan <gshan@redhat.com> wrote:
>>>
>>>> I have rough idea as below. It's appreciated if you can comment before I'm
>>>> going a head for the prototype. The overall idea is to introduce another
>>>> dirty ring for KVM (kvm-dirty-ring). It's updated and visited separately
>>>> to dirty ring for vcpu (vcpu-dirty-ring).
>>>>
>>>> - When the various VGIC/ITS table base addresses are specified, kvm-dirty-ring
>>>> entries are added to mark those pages as 'always-dirty'. In mark_page_dirty_in_slot(),
>>>> those 'always-dirty' pages will be skipped, no entries pushed to vcpu-dirty-ring.
>>>>
>>>> - Similar to vcpu-dirty-ring, kvm-dirty-ring is accessed from userspace through
>>>> mmap(kvm->fd). However, there won't have similar reset interface. It means
>>>> 'struct kvm_dirty_gfn::flags' won't track any information as we do for
>>>> vcpu-dirty-ring. In this regard, kvm-dirty-ring is purely shared buffer to
>>>> advertise 'always-dirty' pages from host to userspace.
>>>> - For QEMU, shutdown/suspend/resume cases won't be concerning
>>>> us any more. The
>>>> only concerned case is migration. When the migration is about to complete,
>>>> kvm-dirty-ring entries are fetched and the dirty bits are updated to global
>>>> dirty page bitmap and RAMBlock's dirty page bitmap. For this, I'm still reading
>>>> the code to find the best spot to do it.
>>>
>>> I think it makes a lot of sense to have a way to log writes that are
>>> not generated by a vpcu, such as the GIC and maybe other things in the
>>> future, such as DMA traffic (some SMMUs are able to track dirty pages
>>> as well).
>>>
>>> However, I don't really see the point in inventing a new mechanism for
>>> that. Why don't we simply allow non-vpcu dirty pages to be tracked in
>>> the dirty *bitmap*?
>>>
>>> From a kernel perspective, this is dead easy:
>>>
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 5b064dbadaf4..ae9138f29d51 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -3305,7 +3305,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>> struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>> #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>>> - if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>>> + if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>>> return;
>>> #endif
>>> @@ -3313,10 +3313,11 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>> unsigned long rel_gfn = gfn - memslot->base_gfn;
>>> u32 slot = (memslot->as_id << 16) | memslot->id;
>>> - if (kvm->dirty_ring_size)
>>> + if (vpcu && kvm->dirty_ring_size)
>>> kvm_dirty_ring_push(&vcpu->dirty_ring,
>>> slot, rel_gfn);
>>> - else
>>> + /* non-vpcu dirtying ends up in the global bitmap */
>>> + if (!vcpu && memslot->dirty_bitmap)
>>> set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>> }
>>> }
>>>
>>> though I'm sure there is a few more things to it.
>>
>> Yes, currently the bitmaps are not created when rings are enabled.
>> kvm_prepare_memory_region() has:
>>
>> else if (!kvm->dirty_ring_size) {
>> r = kvm_alloc_dirty_bitmap(new);
>>
>> But I think maybe that's a solution worth considering. Using the rings
>> have a major challenge on the limitation of ring size, so that for e.g. an
>> ioctl we need to make sure the pages to dirty within an ioctl procedure
>> will not be more than the ring can take. Using dirty bitmap for a last
>> phase sync of constant (but still very small amount of) dirty pages does
>> sound reasonable and can avoid that complexity. The payoff is we'll need
>> to allocate both the rings and the bitmaps.
>>
>
> Ok. I was thinking of using the bitmap to convey the dirty pages for
> this particular case, where we don't have running vcpu. The concern I had
> is the natural difference between a ring and bitmap. The ring-buffer is
> discrete, comparing to bitmap. Besides, it sounds a little strange to
> have two different sets of meta-data to track the data (dirty pages).
>
> However, bitmap is easier way than per-vm ring. The constrains with
> per-vm ring is just as Peter pointed. So lets reuse the bitmap to
> convey the dirty pages for this particular case. I think the payoff,
> extra bitmap, is acceptable. For this, we need another capability
> (KVM_CAP_DIRTY_LOG_RING_BITMAP?) so that QEMU can collects the dirty
> bitmap in the last phase of migration.
>
> If all of us agree on this, I can send another kernel patch to address
> this. QEMU still need more patches so that the feature can be supported.
>
I've had the following PATCH[v5 3/7] to reuse bitmap for these particular
cases. KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG ioctls are used to visit
the bitmap. The new capability is advertised by KVM_CAP_DIRTY_LOG_RING_BITMAP.
Note those two ioctls are disabled when dirty-ring is enabled, we need to
enable them accordingly.
PATCH[v5 3/7] KVM: x86: Use bitmap in ring-based dirty page tracking
I would like to post v5 after someone reviews or acks kvm/selftests part
of this series.
[...]
Thanks,
Gavin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
next prev parent reply other threads:[~2022-09-29 11:31 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-27 0:54 [PATCH v4 0/6] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-09-27 0:54 ` [PATCH v4 1/6] KVM: x86: Introduce KVM_REQ_RING_SOFT_FULL Gavin Shan
2022-09-27 10:26 ` Marc Zyngier
2022-09-27 11:31 ` Gavin Shan
2022-09-27 16:00 ` Peter Xu
2022-09-27 0:54 ` [PATCH v4 2/6] KVM: x86: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-09-27 16:00 ` Peter Xu
2022-09-27 0:54 ` [PATCH v4 3/6] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-09-27 16:02 ` Peter Xu
2022-09-27 17:32 ` Marc Zyngier
2022-09-27 18:21 ` Peter Xu
2022-09-27 23:47 ` Gavin Shan
2022-09-28 8:25 ` Marc Zyngier
2022-09-28 14:52 ` Peter Xu
2022-09-29 9:50 ` Gavin Shan
2022-09-29 11:31 ` Gavin Shan [this message]
2022-09-29 14:44 ` Marc Zyngier
2022-09-29 14:32 ` Peter Xu
2022-09-30 9:28 ` Marc Zyngier
2022-09-29 14:42 ` Marc Zyngier
2022-10-04 4:26 ` Gavin Shan
2022-10-04 4:26 ` Gavin Shan
2022-10-04 13:26 ` Peter Xu
2022-10-04 13:26 ` Peter Xu
2022-10-04 15:45 ` Marc Zyngier
2022-10-04 15:45 ` Marc Zyngier
2022-09-29 14:34 ` Marc Zyngier
2022-09-27 0:54 ` [PATCH v4 4/6] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-09-27 0:54 ` [PATCH v4 5/6] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-09-27 0:54 ` [PATCH v4 6/6] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-09-27 10:30 ` [PATCH v4 0/6] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ddc4166c-81b6-2f7b-87a7-4af3d7db888a@redhat.com \
--to=gshan@redhat.com \
--cc=andrew.jones@linux.dev \
--cc=bgardon@google.com \
--cc=catalin.marinas@arm.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=maz@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=shan.gavin@gmail.com \
--cc=shuah@kernel.org \
--cc=will@kernel.org \
--cc=zhenyzha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).