kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Christophe de Dinechin <dinechin@redhat.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Lei Cao <lei.cao@stratus.com>
Subject: Re: [PATCH RESEND v2 08/17] KVM: X86: Implement ring-based dirty memory tracking
Date: Wed, 25 Dec 2019 11:23:23 +0800	[thread overview]
Message-ID: <14c2c6d3-00fc-1507-9dd3-c25605717d3d@redhat.com> (raw)
In-Reply-To: <20191224150836.GB3023@xz-x1>


On 2019/12/24 下午11:08, Peter Xu wrote:
> On Tue, Dec 24, 2019 at 02:16:04PM +0800, Jason Wang wrote:
>>> +struct kvm_dirty_ring {
>>> +	u32 dirty_index;
>>
>> Does this always equal to indices->avail_index?
> Yes, but here we keep dirty_index as the internal one, so we never
> need to worry about illegal userspace writes to avail_index (then we
> never read it from kernel).


I get you. But I'm not sure it's wroth to bother. We meet similar issue 
in virtio, the used_idx is not expected to write by userspace. We simply 
add checks.

But anyway, I'm fine if you want to keep it (maybe with a comment to 
explain).


>
>>
>>> +	u32 reset_index;
>>> +	u32 size;
>>> +	u32 soft_limit;
>>> +	struct kvm_dirty_gfn *dirty_gfns;
>>> +	struct kvm_dirty_ring_indices *indices;
>>
>> Any reason to keep dirty gfns and indices in different places? I guess it is
>> because you want to map dirty_gfns as readonly page but I couldn't find such
>> codes...
> That's a good point!  We should actually map the dirty gfns as read
> only.  I've added the check, something like this:
>
> static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
> {
> 	struct kvm_vcpu *vcpu = file->private_data;
> 	unsigned long pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
>
> 	/* If to map any writable page within dirty ring, fail it */
> 	if ((kvm_page_in_dirty_ring(vcpu->kvm, vma->vm_pgoff) ||
> 	     kvm_page_in_dirty_ring(vcpu->kvm, vma->vm_pgoff + pages - 1)) &&
> 	    vma->vm_flags & VM_WRITE)
> 		return -EINVAL;
>
> 	vma->vm_ops = &kvm_vcpu_vm_ops;
> 	return 0;
> }
>
> I also changed the test code to cover this case.
>
> [...]


Looks good.


>
>>> +struct kvm_dirty_ring_indices {
>>> +	__u32 avail_index; /* set by kernel */
>>> +	__u32 fetch_index; /* set by userspace */
>>
>> Is this better to make those two cacheline aligned?
> Yes, Paolo should have mentioned that but I must have missed it!  I
> hope I didn't miss anything else.
>
> [...]
>
>>> +int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>>> +{
>>> +	u32 cur_slot, next_slot;
>>> +	u64 cur_offset, next_offset;
>>> +	unsigned long mask;
>>> +	u32 fetch;
>>> +	int count = 0;
>>> +	struct kvm_dirty_gfn *entry;
>>> +	struct kvm_dirty_ring_indices *indices = ring->indices;
>>> +	bool first_round = true;
>>> +
>>> +	fetch = READ_ONCE(indices->fetch_index);
>>> +
>>> +	/*
>>> +	 * Note that fetch_index is written by the userspace, which
>>> +	 * should not be trusted.  If this happens, then it's probably
>>> +	 * that the userspace has written a wrong fetch_index.
>>> +	 */
>>> +	if (fetch - ring->reset_index > ring->size)
>>> +		return -EINVAL;
>>> +
>>> +	if (fetch == ring->reset_index)
>>> +		return 0;
>>> +
>>> +	/* This is only needed to make compilers happy */
>>> +	cur_slot = cur_offset = mask = 0;
>>> +	while (ring->reset_index != fetch) {
>>> +		entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
>>> +		next_slot = READ_ONCE(entry->slot);
>>> +		next_offset = READ_ONCE(entry->offset);
>>> +		ring->reset_index++;
>>> +		count++;
>>> +		/*
>>> +		 * Try to coalesce the reset operations when the guest is
>>> +		 * scanning pages in the same slot.
>>> +		 */
>>> +		if (!first_round && next_slot == cur_slot) {
>>
>> initialize cur_slot to -1 then we can drop first_round here?
> cur_slot is unsigned.  We can force cur_slot to be s64 but maybe we
> can also simply keep the first_round to be clear from its name.
>
> [...]


Sure.


>
>>> +int kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
>>> +{
>>> +	struct kvm_dirty_gfn *entry;
>>> +	struct kvm_dirty_ring_indices *indices = ring->indices;
>>> +
>>> +	/*
>>> +	 * Note: here we will start waiting even soft full, because we
>>> +	 * can't risk making it completely full, since vcpu0 could use
>>> +	 * it right after us and if vcpu0 context gets full it could
>>> +	 * deadlock if wait with mmu_lock held.
>>> +	 */
>>> +	if (kvm_get_running_vcpu() == NULL &&
>>> +	    kvm_dirty_ring_soft_full(ring))
>>> +		return -EBUSY;
>>> +
>>> +	/* It will never gets completely full when with a vcpu context */
>>> +	WARN_ON_ONCE(kvm_dirty_ring_full(ring));
>>> +
>>> +	entry = &ring->dirty_gfns[ring->dirty_index & (ring->size - 1)];
>>> +	entry->slot = slot;
>>> +	entry->offset = offset;
>>> +	smp_wmb();
>>
>> Better to add comment to explain this barrier. E.g pairing.
> Will do.
>
>>
>>> +	ring->dirty_index++;
>>> +	WRITE_ONCE(indices->avail_index, ring->dirty_index);
>>
>> Is WRITE_ONCE() a must here?
> I think not, but seems to be clearer that we're publishing something
> explicilty to userspace.  Since asked, I'm actually curious on whether
> immediate memory writes like this could start to affect perf from any
> of your previous perf works?


I never measure the impact for a specific WRITE_ONCE(). But we don't do 
this in virtio/vhost. Maybe the maintainers can give more comments on this.

Thanks


>
> Thanks,
>


  reply	other threads:[~2019-12-25  3:23 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-21  1:49 [PATCH RESEND v2 00/17] KVM: Dirty ring interface Peter Xu
2019-12-21  1:49 ` [PATCH RESEND v2 01/17] KVM: Remove kvm_read_guest_atomic() Peter Xu
2020-01-08 17:45   ` Paolo Bonzini
2019-12-21  1:49 ` [PATCH RESEND v2 02/17] KVM: X86: Change parameter for fast_page_fault tracepoint Peter Xu
2020-01-08 17:46   ` Paolo Bonzini
2019-12-21  1:49 ` [PATCH RESEND v2 03/17] KVM: X86: Don't track dirty for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR] Peter Xu
2019-12-21 13:51   ` Paolo Bonzini
2019-12-23 17:27     ` Peter Xu
2019-12-23 17:59       ` Paolo Bonzini
2019-12-23 20:10         ` Peter Xu
2020-01-08 17:46           ` Paolo Bonzini
2020-01-08 19:15             ` Peter Xu
2020-01-08 19:44               ` Paolo Bonzini
2020-01-08 21:02                 ` Peter Xu
2019-12-21  1:49 ` [PATCH RESEND v2 04/17] KVM: Cache as_id in kvm_memory_slot Peter Xu
2020-01-08 17:47   ` Paolo Bonzini
2019-12-21  1:49 ` [PATCH RESEND v2 05/17] KVM: Add build-time error check on kvm_run size Peter Xu
2019-12-21  1:49 ` [PATCH RESEND v2 06/17] KVM: Pass in kvm pointer into mark_page_dirty_in_slot() Peter Xu
2020-01-08 17:47   ` Paolo Bonzini
2019-12-21  1:49 ` [PATCH RESEND v2 07/17] KVM: Move running VCPU from ARM to common code Peter Xu
2020-01-08 17:47   ` Paolo Bonzini
2019-12-21  1:49 ` [PATCH RESEND v2 08/17] KVM: X86: Implement ring-based dirty memory tracking Peter Xu
2019-12-24  6:16   ` Jason Wang
2019-12-24 15:08     ` Peter Xu
2019-12-25  3:23       ` Jason Wang [this message]
2020-01-08 15:52   ` Peter Xu
2020-01-08 17:41     ` Paolo Bonzini
2020-01-08 19:06       ` Peter Xu
2020-01-08 19:44         ` Paolo Bonzini
2020-01-08 19:59           ` Peter Xu
2020-01-08 20:06             ` Paolo Bonzini
2019-12-21  1:49 ` [PATCH RESEND v2 09/17] KVM: Make dirty ring exclusive to dirty bitmap log Peter Xu
2019-12-21  1:58 ` [PATCH RESEND v2 10/17] KVM: Don't allocate dirty bitmap if dirty ring is enabled Peter Xu
2019-12-21  2:04 ` [PATCH RESEND v2 11/17] KVM: selftests: Always clear dirty bitmap after iteration Peter Xu
2019-12-21  2:04 ` [PATCH RESEND v2 12/17] KVM: selftests: Sync uapi/linux/kvm.h to tools/ Peter Xu
2019-12-21  2:04 ` [PATCH RESEND v2 13/17] KVM: selftests: Use a single binary for dirty/clear log test Peter Xu
2019-12-21  2:04 ` [PATCH RESEND v2 14/17] KVM: selftests: Introduce after_vcpu_run hook for dirty " Peter Xu
2019-12-21  2:04 ` [PATCH RESEND v2 15/17] KVM: selftests: Add dirty ring buffer test Peter Xu
2019-12-24  6:18   ` Jason Wang
2019-12-24 15:22     ` Peter Xu
2019-12-24  6:50   ` Jason Wang
2019-12-24 15:24     ` Peter Xu
2019-12-21  2:04 ` [PATCH RESEND v2 16/17] KVM: selftests: Let dirty_log_test async for dirty ring test Peter Xu
2019-12-21  2:04 ` [PATCH RESEND v2 17/17] KVM: selftests: Add "-c" parameter to dirty log test Peter Xu
2019-12-24  6:34 ` [PATCH RESEND v2 00/17] KVM: Dirty ring interface Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14c2c6d3-00fc-1507-9dd3-c25605717d3d@redhat.com \
    --to=jasowang@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=dinechin@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=lei.cao@stratus.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).