QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, "Michael S . Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH RFC 9/9] KVM: Dirty ring support
Date: Thu, 26 Mar 2020 14:14:36 +0000
Message-ID: <20200326141436.GD2713@work-vm> (raw)
In-Reply-To: <20200325213237.GG404034@xz-x1>

* Peter Xu (peterx@redhat.com) wrote:
> On Wed, Mar 25, 2020 at 08:41:44PM +0000, Dr. David Alan Gilbert wrote:
> 
> [...]
> 
> > > +enum KVMReaperState {
> > > +    KVM_REAPER_NONE = 0,
> > > +    /* The reaper is sleeping */
> > > +    KVM_REAPER_WAIT,
> > > +    /* The reaper is reaping for dirty pages */
> > > +    KVM_REAPER_REAPING,
> > > +};
> > 
> > That probably needs to be KVMDirtyRingReaperState
> > given there are many things that could be reaped.
> 
> Sure.
> 
> > 
> > > +/*
> > > + * KVM reaper instance, responsible for collecting the KVM dirty bits
> > > + * via the dirty ring.
> > > + */
> > > +struct KVMDirtyRingReaper {
> > > +    /* The reaper thread */
> > > +    QemuThread reaper_thr;
> > > +    /*
> > > +     * Telling the reaper thread to wakeup.  This should be used as a
> > > +     * generic interface to kick the reaper thread, like, in vcpu
> > > +     * threads where it gets a exit due to ring full.
> > > +     */
> > > +    EventNotifier reaper_event;
> > 
> > I think I'd just used a simple semaphore for this type of thing.
> 
> I'm actually uncertain on which is cheaper...
> 
> At the meantime, I wanted to poll two handles at the same time below
> (in kvm_dirty_ring_reaper_thread).  I don't know how to do that with
> semaphore.  Could it?

If you're OK with EventNotifier stick with it;  it's just I'm used
to doing with it with a semaphore; e.g. a flag then the semaphore - but
that's fine.

> [...]
> 
> > > @@ -412,6 +460,18 @@ int kvm_init_vcpu(CPUState *cpu)
> > >              (void *)cpu->kvm_run + s->coalesced_mmio * PAGE_SIZE;
> > >      }
> > >  
> > > +    if (s->kvm_dirty_gfn_count) {
> > > +        cpu->kvm_dirty_gfns = mmap(NULL, s->kvm_dirty_ring_size,
> > > +                                   PROT_READ | PROT_WRITE, MAP_SHARED,
> > 
> > Is the MAP_SHARED required?
> 
> Yes it's required.  It's the same when we map the per-vcpu kvm_run.
> 
> If we use MAP_PRIVATE, it'll be in a COW fashion - when the userspace
> writes to the dirty gfns the 1st time, it'll copy the current dirty
> ring page in the kernel and from now on QEMU will never be able to see
> what the kernel writes to the dirty gfn pages.  MAP_SHARED means the
> userspace and the kernel shares exactly the same page(s).

OK, worth a comment.

> > 
> > > +                                   cpu->kvm_fd,
> > > +                                   PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
> > > +        if (cpu->kvm_dirty_gfns == MAP_FAILED) {
> > > +            ret = -errno;
> > > +            DPRINTF("mmap'ing vcpu dirty gfns failed\n");
> > 
> > Include errno?
> 
> Will do.
> 
> [...]
> 
> > > +static uint64_t kvm_dirty_ring_reap(KVMState *s)
> > > +{
> > > +    KVMMemoryListener *kml;
> > > +    int ret, i, locked_count = s->nr_as;
> > > +    CPUState *cpu;
> > > +    uint64_t total = 0;
> > > +
> > > +    /*
> > > +     * We need to lock all kvm slots for all address spaces here,
> > > +     * because:
> > > +     *
> > > +     * (1) We need to mark dirty for dirty bitmaps in multiple slots
> > > +     *     and for tons of pages, so it's better to take the lock here
> > > +     *     once rather than once per page.  And more importantly,
> > > +     *
> > > +     * (2) We must _NOT_ publish dirty bits to the other threads
> > > +     *     (e.g., the migration thread) via the kvm memory slot dirty
> > > +     *     bitmaps before correctly re-protect those dirtied pages.
> > > +     *     Otherwise we can have potential risk of data corruption if
> > > +     *     the page data is read in the other thread before we do
> > > +     *     reset below.
> > > +     */
> > > +    for (i = 0; i < s->nr_as; i++) {
> > > +        kml = s->as[i].ml;
> > > +        if (!kml) {
> > > +            /*
> > > +             * This is tricky - we grow s->as[] dynamically now.  Take
> > > +             * care of that case.  We also assumed the as[] will fill
> > > +             * one by one starting from zero.  Without this, we race
> > > +             * with register_smram_listener.
> > > +             *
> > > +             * TODO: make all these prettier...
> > > +             */
> > > +            locked_count = i;
> > > +            break;
> > > +        }
> > > +        kvm_slots_lock(kml);
> > > +    }
> > > +
> > > +    CPU_FOREACH(cpu) {
> > > +        total += kvm_dirty_ring_reap_one(s, cpu);
> > > +    }
> > > +
> > > +    if (total) {
> > > +        ret = kvm_vm_ioctl(s, KVM_RESET_DIRTY_RINGS);
> > > +        assert(ret == total);
> > > +    }
> > > +
> > > +    /* Unlock whatever locks that we have locked */
> > > +    for (i = 0; i < locked_count; i++) {
> > > +        kvm_slots_unlock(s->as[i].ml);
> > > +    }
> > > +
> > > +    CPU_FOREACH(cpu) {
> > > +        if (cpu->kvm_dirty_ring_full) {
> > > +            qemu_sem_post(&cpu->kvm_dirty_ring_avail);
> > > +        }
> > 
> > Why do you need to wait until here - couldn't you release
> > each vcpu after you've reaped it?
> 
> We probably still need to wait.  Even after we reaped all the dirty
> bits we only marked the pages as "collected", the buffers will only be
> available again until the kernel re-protect those pages (when the
> above KVM_RESET_DIRTY_RINGS completes).  Before that, continuing the
> vcpu could let it exit again with the same ring full event.

Ah OK.

> [...]
> 
> > > +static int kvm_dirty_ring_reaper_init(KVMState *s)
> > > +{
> > > +    struct KVMDirtyRingReaper *r = &s->reaper;
> > > +    int ret;
> > > +
> > > +    ret = event_notifier_init(&r->reaper_event, false);
> > > +    assert(ret == 0);
> > > +    ret = event_notifier_init(&r->reaper_flush_event, false);
> > > +    assert(ret == 0);
> > > +    qemu_sem_init(&r->reaper_flush_sem, 0);
> > > +
> > > +    qemu_thread_create(&r->reaper_thr, "kvm-reaper",
> > > +                       kvm_dirty_ring_reaper_thread,
> > > +                       s, QEMU_THREAD_JOINABLE);
> > 
> > That's marked as joinable - does it ever get joined?
> > If the reaper thread does exit on error (I can only see the poll
> > failure?) - what happens elsewhere - will things still try and kick it?
> 
> The reaper thread is not designed to fail for sure. If it fails, it'll
> exit without being joined, but otherwise I'll just abort() directly in
> the thread which seems to be not anything better...
> 
> Regarding to "why not join() it": I think it's simply because we don't
> have corresponding operation to AccelClass.init_machine() and
> kvm_init(). :-) From that POV, I think I'll still use JOINABLE with
> the hope that someday the destroy_machine() hook will be ready and
> we'll be able to join it.

OK, that's fine.

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-05 14:17 [PATCH RFC 0/9] KVM: Dirty ring support (QEMU part) Peter Xu
2020-02-05 14:17 ` [PATCH RFC 1/9] KVM: Fixup kvm_log_clear_one_slot() ioctl return check Peter Xu
2020-03-25 16:34   ` Dr. David Alan Gilbert
2020-03-25 17:43   ` Peter Xu
2020-02-05 14:17 ` [PATCH RFC 2/9] linux-headers: Update Peter Xu
2020-02-05 14:17 ` [PATCH RFC 3/9] memory: Introduce log_sync_global() to memory listener Peter Xu
2020-03-25 16:52   ` Dr. David Alan Gilbert
2020-03-25 17:02     ` Peter Xu
2020-02-05 14:17 ` [PATCH RFC 4/9] KVM: Create the KVMSlot dirty bitmap on flag changes Peter Xu
2020-03-25 17:44   ` Dr. David Alan Gilbert
2020-02-05 14:17 ` [PATCH RFC 5/9] KVM: Provide helper to get kvm dirty log Peter Xu
2020-03-25 17:52   ` Dr. David Alan Gilbert
2020-03-25 18:15     ` Peter Xu
2020-02-05 14:17 ` [PATCH RFC 6/9] KVM: Provide helper to sync dirty bitmap from slot to ramblock Peter Xu
2020-03-25 18:47   ` Dr. David Alan Gilbert
2020-02-05 14:17 ` [PATCH RFC 7/9] KVM: Cache kvm slot dirty bitmap size Peter Xu
2020-03-25 18:49   ` Dr. David Alan Gilbert
2020-02-05 14:17 ` [PATCH RFC 8/9] KVM: Add dirty-ring-size property Peter Xu
2020-03-25 20:00   ` Dr. David Alan Gilbert
2020-03-25 20:42     ` Peter Xu
2020-03-26 13:41       ` Dr. David Alan Gilbert
2020-03-26 16:03         ` Peter Xu
2020-03-25 20:14   ` Dr. David Alan Gilbert
2020-03-25 20:48     ` Peter Xu
2020-02-05 14:17 ` [PATCH RFC 9/9] KVM: Dirty ring support Peter Xu
2020-03-25 20:41   ` Dr. David Alan Gilbert
2020-03-25 21:32     ` Peter Xu
2020-03-26 14:14       ` Dr. David Alan Gilbert [this message]
2020-03-26 16:10         ` Peter Xu
2020-02-05 14:51 ` [PATCH RFC 0/9] KVM: Dirty ring support (QEMU part) no-reply
2020-03-03 17:32 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200326141436.GD2713@work-vm \
    --to=dgilbert@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git