All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Anish Moorthy <amoorthy@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	maz@kernel.org, oliver.upton@linux.dev,
	Sean Christopherson <seanjc@google.com>,
	James Houghton <jthoughton@google.com>,
	bgardon@google.com, dmatlack@google.com, ricarkol@google.com,
	kvm <kvm@vger.kernel.org>,
	kvmarm@lists.linux.dev
Subject: Re: [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults.
Date: Wed, 3 May 2023 17:27:00 -0400	[thread overview]
Message-ID: <ZFLRpEV09lrpJqua@x1n> (raw)
In-Reply-To: <ZFLPlRReglM/Vgfu@x1n>

Oops, bounced back from the list..

Forward with no attachment this time - I assume the information is still
enough in the paragraphs even without the flamegraphs.  Sorry for the
noise.

On Wed, May 03, 2023 at 05:18:13PM -0400, Peter Xu wrote:
> On Wed, May 03, 2023 at 12:45:07PM -0700, Anish Moorthy wrote:
> > On Thu, Apr 27, 2023 at 1:26 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Thanks (for doing this test, and also to Nadav for all his inputs), and
> > > sorry for a late response.
> > 
> > No need to apologize: anyways, I've got you comfortably beat on being
> > late at this point :)
> > 
> > > These numbers caught my eye, and I'm very curious why even 2 vcpus can
> > > scale that bad.
> > >
> > > I gave it a shot on a test machine and I got something slightly different:
> > >
> > >   Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores, 40 threads)
> > >   $ ./demand_paging_test -b 512M -u MINOR -s shmem -v N
> > >   |-------+----------+--------|
> > >   | n_thr | per-vcpu | total  |
> > >   |-------+----------+--------|
> > >   |     1 | 39.5K    | 39.5K  |
> > >   |     2 | 33.8K    | 67.6K  |
> > >   |     4 | 31.8K    | 127.2K |
> > >   |     8 | 30.8K    | 246.1K |
> > >   |    16 | 21.9K    | 351.0K |
> > >   |-------+----------+--------|
> > >
> > > I used larger ram due to less cores.  I didn't try 32+ vcpus to make sure I
> > > don't have two threads content on a core/thread already since I only got 40
> > > hardware threads there, but still we can compare with your lower half.
> > >
> > > When I was testing I noticed bad numbers and another bug on not using
> > > NSEC_PER_SEC properly, so I did this before the test:
> > >
> > > https://lore.kernel.org/all/20230427201112.2164776-1-peterx@redhat.com/
> > >
> > > I think it means it still doesn't scale that good, however not so bad
> > > either - no obvious 1/2 drop on using 2vcpus.  There're still a bunch of
> > > paths triggered in the test so I also don't expect it to fully scale
> > > linearly.  From my numbers I just didn't see as drastic as yours. I'm not
> > > sure whether it's simply broken test number, parameter differences
> > > (e.g. you used 64M only per-vcpu), or hardware differences.
> > 
> > Hmm, I suspect we're dealing with  hardware differences here. I
> > rebased my changes onto those two patches you sent up, taking care not
> > to clobber them, but even with the repro command you provided my
> > results look very different than yours (at least on 1-4 vcpus) on the
> > machine I've been testing on (4x AMD EPYC 7B13 64-Core, 2.2GHz).
> > 
> > (n=20)
> > n_thr      per_vcpu       total
> > 1            154K              154K
> > 2             92k                184K
> > 4             71K                285K
> > 8             36K                291K
> > 16           19K                310K
> > 
> > Out of interested I tested on another machine (Intel(R) Xeon(R)
> > Platinum 8273CL CPU @ 2.20GHz) as well, and results are a bit
> > different again
> > 
> > (n=20)
> > n_thr      per_vcpu       total
> > 1            115K              115K
> > 2             103k              206K
> > 4             65K                262K
> > 8             39K                319K
> > 16           19K                398K
> 
> Interesting.
> 
> > 
> > It is interesting how all three sets of numbers start off different
> > but seem to converge around 16 vCPUs. I did check to make sure the
> > memory fault exits sped things up in all cases, and that at least
> > stays true.
> > 
> > By the way, I've got a little helper script that I've been using to
> > run/average the selftest results (which can vary quite a bit). I've
> > attached it below- hopefully it doesn't bounce from the mailing list.
> > Just for reference, the invocation to test the command you provided is
> > 
> > > python dp_runner.py --num_runs 20 --max_cores 16 --percpu_mem 512M
> 
> I found that indeed I shouldn't have stopped at 16 vcpus since that's
> exactly where it starts to bottleneck. :)
> 
> So out of my curiosity I tried to profile 32 vcpus case on my system with
> this test case, meanwhile I tried it both with:
> 
>   - 1 uffd + 8 readers
>   - 32 uffds (so 32 readers)
> 
> I've got the flamegraphs attached for both.
> 
> It seems that when using >1 uffds the bottleneck is not the spinlock
> anymore but something else.
> 
> From what I got there, vmx_vcpu_load() gets more highlights than the
> spinlocks. I think that's the tlb flush broadcast.
> 
> While OTOH indeed when using 1 uffd we can see obviously the overhead of
> spinlock contention on either the fault() path or read()/poll() as you and
> James rightfully pointed out.
> 
> I'm not sure whether my number is caused by special setup, though. After
> all I only had 40 threads and I started 32 vcpus + 8 readers and there'll
> be contention already between the workloads.
> 
> IMHO this means that there's still chance to provide a more generic
> userfaultfd scaling solution as long as we can remove the single spinlock
> contention on the fault/fault_pending queues.  I'll see whether I can still
> explore a bit on the possibility of this and keep you guys updated.  The
> general idea here to me is still to make multi-queue out of 1 uffd.
> 
> I _think_ this might also be a positive result to your work, because if the
> bottleneck is not userfaultfd (as we scale it with creating multiple;
> ignoring the split vma effect), then it cannot be resolved by scaling
> userfaultfd alone anyway, anymore.  So a general solution, even if existed,
> may not work here for kvm, because we'll get stuck somewhere else already.
> 
> -- 
> Peter Xu




-- 
Peter Xu


  reply	other threads:[~2023-05-03 21:27 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-12 21:34 [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 01/22] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test Anish Moorthy
2023-04-19 13:51   ` Hoo Robert
2023-04-20 17:55     ` Anish Moorthy
2023-04-21 12:15       ` Robert Hoo
2023-04-21 16:21         ` Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 02/22] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2023-04-19 13:36   ` Hoo Robert
2023-04-19 23:26     ` Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 03/22] KVM: Allow hva_pfn_fast() to resolve read-only faults Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 04/22] KVM: x86: Set vCPU exit reason to KVM_EXIT_UNKNOWN at the start of KVM_RUN Anish Moorthy
2023-05-02 17:17   ` Anish Moorthy
2023-05-02 18:51     ` Sean Christopherson
2023-05-02 19:49       ` Anish Moorthy
2023-05-02 20:41         ` Sean Christopherson
2023-05-02 21:46           ` Anish Moorthy
2023-05-02 22:31             ` Sean Christopherson
2023-04-12 21:34 ` [PATCH v3 05/22] KVM: Add KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
2023-04-19 13:57   ` Hoo Robert
2023-04-20 18:09     ` Anish Moorthy
2023-04-21 12:28       ` Robert Hoo
2023-06-01 19:52   ` Oliver Upton
2023-06-01 20:30     ` Anish Moorthy
2023-06-01 21:29       ` Oliver Upton
2023-07-04 10:10   ` Kautuk Consul
2023-04-12 21:34 ` [PATCH v3 06/22] KVM: Add docstrings to __kvm_write_guest_page() and __kvm_read_guest_page() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 07/22] KVM: Annotate -EFAULTs from kvm_vcpu_write_guest_page() Anish Moorthy
2023-04-20 20:52   ` Peter Xu
2023-04-20 23:29     ` Anish Moorthy
2023-04-21 15:00       ` Peter Xu
2023-04-12 21:34 ` [PATCH v3 08/22] KVM: Annotate -EFAULTs from kvm_vcpu_read_guest_page() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 09/22] KVM: Annotate -EFAULTs from kvm_vcpu_map() Anish Moorthy
2023-04-20 20:53   ` Peter Xu
2023-04-20 23:34     ` Anish Moorthy
2023-04-21 14:58       ` Peter Xu
2023-04-12 21:34 ` [PATCH v3 10/22] KVM: x86: Annotate -EFAULTs from kvm_mmu_page_fault() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 11/22] KVM: x86: Annotate -EFAULTs from setup_vmgexit_scratch() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 12/22] KVM: x86: Annotate -EFAULTs from kvm_handle_page_fault() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 13/22] KVM: x86: Annotate -EFAULTs from kvm_hv_get_assist_page() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 14/22] KVM: x86: Annotate -EFAULTs from kvm_pv_clock_pairing() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 15/22] KVM: x86: Annotate -EFAULTs from direct_map() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 16/22] KVM: x86: Annotate -EFAULTs from kvm_handle_error_pfn() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 17/22] KVM: Introduce KVM_CAP_ABSENT_MAPPING_FAULT without implementation Anish Moorthy
2023-04-19 14:00   ` Hoo Robert
2023-04-20 18:23     ` Anish Moorthy
2023-04-24 21:02   ` Sean Christopherson
2023-06-01 16:04     ` Oliver Upton
2023-06-01 18:19   ` Oliver Upton
2023-06-01 18:59     ` Sean Christopherson
2023-06-01 19:29       ` Oliver Upton
2023-06-01 19:34         ` Sean Christopherson
2023-04-12 21:35 ` [PATCH v3 18/22] KVM: x86: Implement KVM_CAP_ABSENT_MAPPING_FAULT Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 19/22] KVM: arm64: Annotate (some) -EFAULTs from user_mem_abort() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 20/22] KVM: arm64: Implement KVM_CAP_ABSENT_MAPPING_FAULT Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 21/22] KVM: selftests: Add memslot_flags parameter to memstress_create_vm() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 22/22] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2023-04-19 14:09   ` Hoo Robert
2023-04-19 16:40     ` Anish Moorthy
2023-04-20 22:47     ` Anish Moorthy
2023-04-27 15:48   ` James Houghton
2023-05-01 18:01     ` Anish Moorthy
2023-04-19 19:55 ` [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults Peter Xu
2023-04-19 20:15   ` Axel Rasmussen
2023-04-19 21:05     ` Peter Xu
2023-04-19 21:53       ` Anish Moorthy
2023-04-20 21:29         ` Peter Xu
2023-04-21 16:58           ` Anish Moorthy
2023-04-21 17:39           ` Nadav Amit
2023-04-24 17:54             ` Anish Moorthy
2023-04-24 19:44               ` Nadav Amit
2023-04-24 20:35                 ` Sean Christopherson
2023-04-24 23:47                   ` Nadav Amit
2023-04-25  0:26                     ` Sean Christopherson
2023-04-25  0:37                       ` Nadav Amit
2023-04-25  0:15                 ` Anish Moorthy
2023-04-25  0:54                   ` Nadav Amit
2023-04-27 16:38                     ` James Houghton
2023-04-27 20:26                   ` Peter Xu
2023-05-03 19:45                     ` Anish Moorthy
2023-05-03 20:09                       ` Sean Christopherson
2023-05-03 21:18                       ` Peter Xu
2023-05-03 21:27                         ` Peter Xu [this message]
2023-05-03 21:42                           ` Sean Christopherson
2023-05-03 23:45                             ` Peter Xu
2023-05-04 19:09                               ` Peter Xu
2023-05-05 18:32                                 ` Anish Moorthy
2023-05-08  1:23                                   ` Peter Xu
2023-05-09 20:52                                     ` Anish Moorthy
2023-05-10 21:50                                       ` Peter Xu
2023-05-11 17:17                                         ` David Matlack
2023-05-11 17:33                                           ` Axel Rasmussen
2023-05-11 19:05                                             ` David Matlack
2023-05-11 19:45                                               ` Axel Rasmussen
2023-05-15 15:16                                                 ` Peter Xu
2023-05-15 15:05                                             ` Peter Xu
2023-05-15 17:16                                         ` Anish Moorthy
2023-05-05 20:05                               ` Nadav Amit
2023-05-08  1:12                                 ` Peter Xu
2023-04-20 23:42         ` Anish Moorthy
2023-05-09 22:19 ` David Matlack
2023-05-10 16:35   ` Anish Moorthy
2023-05-10 22:35   ` Sean Christopherson
2023-05-10 23:44     ` Anish Moorthy
2023-05-23 17:49     ` Anish Moorthy
2023-06-01 22:43       ` Oliver Upton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZFLRpEV09lrpJqua@x1n \
    --to=peterx@redhat.com \
    --cc=amoorthy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=bgardon@google.com \
    --cc=dmatlack@google.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.