kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Peter Xu <peterx@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>
Subject: Re: [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking
Date: Wed, 4 Dec 2019 11:05:47 +0100	[thread overview]
Message-ID: <374f18f1-0592-9b70-adbb-0a72cc77d426@redhat.com> (raw)
In-Reply-To: <20191203184600.GB19877@linux.intel.com>

On 03/12/19 19:46, Sean Christopherson wrote:
> On Tue, Dec 03, 2019 at 02:48:10PM +0100, Paolo Bonzini wrote:
>> On 02/12/19 22:50, Sean Christopherson wrote:
>>>>
>>>> I discussed this with Paolo, but I think Paolo preferred the per-vm
>>>> ring because there's no good reason to choose vcpu0 as what (1)
>>>> suggested.  While if to choose (2) we probably need to lock even for
>>>> per-cpu ring, so could be a bit slower.
>>> Ya, per-vm is definitely better than dumping on vcpu0.  I'm hoping we can
>>> find a third option that provides comparable performance without using any
>>> per-vcpu rings.
>>>
>>
>> The advantage of per-vCPU rings is that it naturally: 1) parallelizes
>> the processing of dirty pages; 2) makes userspace vCPU thread do more
>> work on vCPUs that dirty more pages.
>>
>> I agree that on the producer side we could reserve multiple entries in
>> the case of PML (and without PML only one entry should be added at a
>> time).  But I'm afraid that things get ugly when the ring is full,
>> because you'd have to wait for all vCPUs to finish publishing the
>> entries they have reserved.
> 
> Ah, I take it the intended model is that userspace will only start pulling
> entries off the ring when KVM explicitly signals that the ring is "full"?

No, it's not.  But perhaps in the asynchronous case you can delay
pushing the reserved entries to the consumer until a moment where no
CPUs have left empty slots in the ring buffer (somebody must have done
multi-producer ring buffers before).  In the ring-full case that is
harder because it requires synchronization.

> Rather than reserve entries, what if vCPUs reserved an entire ring?  Create
> a pool of N=nr_vcpus rings that are shared by all vCPUs.  To mark pages
> dirty, a vCPU claims a ring, pushes the pages into the ring, and then
> returns the ring to the pool.  If pushing pages hits the soft limit, a
> request is made to drain the ring and the ring is not returned to the pool
> until it is drained.
> 
> Except for acquiring a ring, which likely can be heavily optimized, that'd
> allow parallel processing (#1), and would provide a facsimile of #2 as
> pushing more pages onto a ring would naturally increase the likelihood of
> triggering a drain.  And it might be interesting to see the effect of using
> different methods of ring selection, e.g. pure round robin, LRU, last used
> on the current vCPU, etc...

If you are creating nr_vcpus rings, and draining is done on the vCPU
thread that has filled the ring, why not create nr_vcpus+1?  The current
code then is exactly the same as pre-claiming a ring per vCPU and never
releasing it, and using a spinlock to claim the per-VM ring.

However, we could build on top of my other suggestion to add
slot->as_id, and wrap kvm_get_running_vcpu() with a nice API, mimicking
exactly what you've suggested.  Maybe even add a scary comment around
kvm_get_running_vcpu() suggesting that users only do so to avoid locking
and wrap it with a nice API.  Similar to what get_cpu/put_cpu do with
smp_processor_id.

1) Add a pointer from struct kvm_dirty_ring to struct
kvm_dirty_ring_indexes:

vcpu->dirty_ring->data = &vcpu->run->vcpu_ring_indexes;
kvm->vm_dirty_ring->data = *kvm->vm_run->vm_ring_indexes;

2) push the ring choice and locking to two new functions

struct kvm_ring *kvm_get_dirty_ring(struct kvm *kvm)
{
	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();

	if (vcpu && !WARN_ON_ONCE(vcpu->kvm != kvm)) {
		return &vcpu->dirty_ring;
	} else {
		/*
		 * Put onto per vm ring because no vcpu context.
		 * We'll kick vcpu0 if ring is full.
		 */
		spin_lock(&kvm->vm_dirty_ring->lock);
		return &kvm->vm_dirty_ring;
	}
}

void kvm_put_dirty_ring(struct kvm *kvm,
			struct kvm_dirty_ring *ring)
{
	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
	bool full = kvm_dirty_ring_used(ring) >= ring->soft_limit;

	if (ring == &kvm->vm_dirty_ring) {
		if (vcpu == NULL)
			vcpu = kvm->vcpus[0];
		spin_unlock(&kvm->vm_dirty_ring->lock);
	}

	if (full)
		kvm_make_request(KVM_REQ_DIRTY_RING_FULL, vcpu);
}

3) simplify kvm_dirty_ring_push to

void kvm_dirty_ring_push(struct kvm_dirty_ring *ring,
			 u32 slot, u64 offset)
{
	/* left as an exercise to the reader */
}

and mark_page_dirty_in_ring to

static void mark_page_dirty_in_ring(struct kvm *kvm,
				    struct kvm_memory_slot *slot,
				    gfn_t gfn)
{
	struct kvm_dirty_ring *ring;

	if (!kvm->dirty_ring_size)
		return;

	ring = kvm_get_dirty_ring(kvm);
	kvm_dirty_ring_push(ring, (slot->as_id << 16) | slot->id,
			    gfn - slot->base_gfn);
	kvm_put_dirty_ring(kvm, ring);
}

Paolo

>> It's ugly that we _also_ need a per-VM ring, but unfortunately some
>> operations do not really have a vCPU that they can refer to.
> 


  reply	other threads:[~2019-12-04 10:05 UTC|newest]

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-29 21:34 [PATCH RFC 00/15] KVM: Dirty ring interface Peter Xu
2019-11-29 21:34 ` [PATCH RFC 01/15] KVM: Move running VCPU from ARM to common code Peter Xu
2019-12-03 19:01   ` Sean Christopherson
2019-12-04  9:42     ` Paolo Bonzini
2019-12-09 22:05       ` Peter Xu
2019-11-29 21:34 ` [PATCH RFC 02/15] KVM: Add kvm/vcpu argument to mark_dirty_page_in_slot Peter Xu
2019-12-02 19:32   ` Sean Christopherson
2019-12-02 20:49     ` Peter Xu
2019-11-29 21:34 ` [PATCH RFC 03/15] KVM: Add build-time error check on kvm_run size Peter Xu
2019-12-02 19:30   ` Sean Christopherson
2019-12-02 20:53     ` Peter Xu
2019-12-02 22:19       ` Sean Christopherson
2019-12-02 22:40         ` Peter Xu
2019-12-03  5:50           ` Sean Christopherson
2019-12-03 13:41         ` Paolo Bonzini
2019-12-03 17:04           ` Peter Xu
2019-11-29 21:34 ` [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking Peter Xu
2019-12-02 20:10   ` Sean Christopherson
2019-12-02 21:16     ` Peter Xu
2019-12-02 21:50       ` Sean Christopherson
2019-12-02 23:09         ` Peter Xu
2019-12-03 13:48         ` Paolo Bonzini
2019-12-03 18:46           ` Sean Christopherson
2019-12-04 10:05             ` Paolo Bonzini [this message]
2019-12-07  0:29               ` Sean Christopherson
2019-12-09  9:37                 ` Paolo Bonzini
2019-12-09 21:54               ` Peter Xu
2019-12-10 10:07                 ` Paolo Bonzini
2019-12-10 15:52                   ` Peter Xu
2019-12-10 17:09                     ` Paolo Bonzini
2019-12-15 17:21                       ` Peter Xu
2019-12-16 10:08                         ` Paolo Bonzini
2019-12-16 18:54                           ` Peter Xu
2019-12-17  9:01                             ` Paolo Bonzini
2019-12-17 16:24                               ` Peter Xu
2019-12-17 16:28                                 ` Paolo Bonzini
2019-12-18 21:58                                   ` Peter Xu
2019-12-18 22:24                                     ` Sean Christopherson
2019-12-18 22:37                                       ` Paolo Bonzini
2019-12-18 22:49                                         ` Peter Xu
2019-12-17  2:28                           ` Tian, Kevin
2019-12-17 16:18                             ` Alex Williamson
2019-12-17 16:30                               ` Paolo Bonzini
2019-12-18  0:29                                 ` Tian, Kevin
     [not found]                           ` <AADFC41AFE54684AB9EE6CBC0274A5D19D645E5F@SHSMSX104.ccr.corp.intel.com>
2019-12-17  5:17                             ` Tian, Kevin
2019-12-17  5:25                               ` Yan Zhao
2019-12-17 16:24                                 ` Alex Williamson
2019-12-03 19:13   ` Sean Christopherson
2019-12-04 10:14     ` Paolo Bonzini
2019-12-04 14:33       ` Sean Christopherson
2019-12-04 10:38   ` Jason Wang
2019-12-04 11:04     ` Paolo Bonzini
2019-12-04 19:52       ` Peter Xu
2019-12-05  6:51         ` Jason Wang
2019-12-05 12:08           ` Peter Xu
2019-12-05 13:12             ` Jason Wang
2019-12-10 13:25       ` Michael S. Tsirkin
2019-12-10 13:31         ` Paolo Bonzini
2019-12-10 16:02           ` Peter Xu
2019-12-10 21:53             ` Michael S. Tsirkin
2019-12-11  9:05               ` Paolo Bonzini
2019-12-11 13:04                 ` Michael S. Tsirkin
2019-12-11 14:54                   ` Peter Xu
2019-12-10 21:48           ` Michael S. Tsirkin
2019-12-11 12:53   ` Michael S. Tsirkin
2019-12-11 14:14     ` Paolo Bonzini
2019-12-11 20:59     ` Peter Xu
2019-12-11 22:57       ` Michael S. Tsirkin
2019-12-12  0:08         ` Paolo Bonzini
2019-12-12  7:36           ` Michael S. Tsirkin
2019-12-12  8:12             ` Paolo Bonzini
2019-12-12 10:38               ` Michael S. Tsirkin
2019-12-15 17:33           ` Peter Xu
2019-12-16  9:47             ` Michael S. Tsirkin
2019-12-16 15:07               ` Peter Xu
2019-12-16 15:33                 ` Michael S. Tsirkin
2019-12-16 15:47                   ` Peter Xu
2019-12-11 17:24   ` Christophe de Dinechin
2019-12-13 20:23     ` Peter Xu
2019-12-14  7:57       ` Paolo Bonzini
2019-12-14 16:26         ` Peter Xu
2019-12-16  9:29           ` Paolo Bonzini
2019-12-16 15:26             ` Peter Xu
2019-12-16 15:31               ` Paolo Bonzini
2019-12-16 15:43                 ` Peter Xu
2019-12-17 12:16         ` Christophe de Dinechin
2019-12-17 12:19           ` Paolo Bonzini
2019-12-17 15:38             ` Peter Xu
2019-12-17 16:31               ` Paolo Bonzini
2019-12-17 16:42                 ` Peter Xu
2019-12-17 16:48                   ` Paolo Bonzini
2019-12-17 19:41                     ` Peter Xu
2019-12-18  0:33                       ` Paolo Bonzini
2019-12-18 16:32                         ` Peter Xu
2019-12-18 16:41                           ` Paolo Bonzini
2019-12-20 18:19       ` Peter Xu
2019-11-29 21:34 ` [PATCH RFC 05/15] KVM: Make dirty ring exclusive to dirty bitmap log Peter Xu
2019-11-29 21:34 ` [PATCH RFC 06/15] KVM: Introduce dirty ring wait queue Peter Xu
2019-11-29 21:34 ` [PATCH RFC 07/15] KVM: X86: Implement ring-based dirty memory tracking Peter Xu
2019-11-29 21:34 ` [PATCH RFC 08/15] KVM: selftests: Always clear dirty bitmap after iteration Peter Xu
2019-11-29 21:34 ` [PATCH RFC 09/15] KVM: selftests: Sync uapi/linux/kvm.h to tools/ Peter Xu
2019-11-29 21:35 ` [PATCH RFC 10/15] KVM: selftests: Use a single binary for dirty/clear log test Peter Xu
2019-11-29 21:35 ` [PATCH RFC 11/15] KVM: selftests: Introduce after_vcpu_run hook for dirty " Peter Xu
2019-11-29 21:35 ` [PATCH RFC 12/15] KVM: selftests: Add dirty ring buffer test Peter Xu
2019-11-29 21:35 ` [PATCH RFC 13/15] KVM: selftests: Let dirty_log_test async for dirty ring test Peter Xu
2019-11-29 21:35 ` [PATCH RFC 14/15] KVM: selftests: Add "-c" parameter to dirty log test Peter Xu
2019-11-29 21:35 ` [PATCH RFC 15/15] KVM: selftests: Test dirty ring waitqueue Peter Xu
2019-11-30  8:29 ` [PATCH RFC 00/15] KVM: Dirty ring interface Paolo Bonzini
2019-12-02  2:13   ` Peter Xu
2019-12-03 13:59     ` Paolo Bonzini
2019-12-05 19:30       ` Peter Xu
2019-12-05 19:59         ` Paolo Bonzini
2019-12-05 20:52           ` Peter Xu
2019-12-02 20:21   ` Sean Christopherson
2019-12-02 20:43     ` Peter Xu
2019-12-04 10:39 ` Jason Wang
2019-12-04 19:33   ` Peter Xu
2019-12-05  6:49     ` Jason Wang
2019-12-11 13:41 ` Christophe de Dinechin
2019-12-11 14:16   ` Paolo Bonzini
2019-12-11 17:15     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=374f18f1-0592-9b70-adbb-0a72cc77d426@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).