kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/8] KVM: Resolve memslot ID via a hash table instead of via a static array
Date: Sat, 22 May 2021 13:11:30 +0200	[thread overview]
Message-ID: <5887de10-c615-175b-e491-86f94e542425@maciej.szmigiero.name> (raw)
In-Reply-To: <c7ba42ee-dc70-a86c-aeb2-d410c136a5ec@maciej.szmigiero.name>

On 21.05.2021 09:05, Maciej S. Szmigiero wrote:
> On 20.05.2021 00:31, Sean Christopherson wrote:
>> On Sun, May 16, 2021, Maciej S. Szmigiero wrote:
(..)
>>>           new_size = old_size;
>>>       slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
>>> -    if (likely(slots))
>>> -        memcpy(slots, old, old_size);
>>> +    if (unlikely(!slots))
>>> +        return NULL;
>>> +
>>> +    memcpy(slots, old, old_size);
>>> +
>>> +    hash_init(slots->id_hash);
>>> +    kvm_for_each_memslot(memslot, slots)
>>> +        hash_add(slots->id_hash, &memslot->id_node, memslot->id);
>>
>> What's the perf penalty if the number of memslots gets large?  I ask because the
>> lazy rmap allocation is adding multiple calls to kvm_dup_memslots().
> 
> I would expect the "move inactive" benchmark to be closest to measuring
> the performance of just a memslot array copy operation but the results
> suggest that the performance stays within ~10% window from 10 to 509
> memslots on the old code (it then climbs 13x for 32k case).
> 
> That suggests that something else is dominating this benchmark for these
> memslot counts (probably zapping of shadow pages).
> 
> At the same time, the tree-based memslots implementation is clearly
> faster in this benchmark, even for smaller memslot counts, so apparently
> copying of the memslot array has some performance impact, too.
> 
> Measuring just kvm_dup_memslots() performance would probably be done
> best by benchmarking KVM_MR_FLAGS_ONLY operation - will try to add this
> operation to my set of benchmarks and see how it performs with different
> memslot counts.

Update:
I've implemented a simple KVM_MR_FLAGS_ONLY benchmark, that repeatably
sets and unsets KVM_MEM_LOG_DIRTY_PAGES flag on a memslot with a single
page of memory in it. [1]

Since on the current code with higher memslot counts the "set flags"
operation spends a significant time in kvm_mmu_calculate_default_mmu_pages()
a second set of measurements was done with patch [2] applied.

In this case, the top functions in the perf trace are "memcpy" and
"clear_page" (called from kvm_set_memslot(), most likely from inlined
kvm_dup_memslots()).

For reference, a set of measurements with the whole patch series
(patches 1 - 8) applied was also done, as "new code".
In this case, SRCU-related functions dominate the perf trace.

32k memslots:
Current code:             0.00130s
Current code + patch [2]: 0.00104s (13x 4k result)
New code:                 0.0000144s

4k memslots:
Current code:             0.0000899s
Current code + patch [2]: 0.0000799s (+78% 2k result)
New code:                 0.0000144s

2k memslots:
Current code:             0.0000495s
Current code + patch [2]: 0.0000447s (+54% 509 result)
New code:                 0.0000143s

509 memslots:
Current code:             0.0000305s
Current code + patch [2]: 0.0000290s (+5% 100 result)
New code:                 0.0000141s

100 memslots:
Current code:             0.0000280s
Current code + patch [2]: 0.0000275s (same as for 10 slots)
New code:                 0.0000142s

10 memslots:
Current code:             0.0000272s
Current code + patch [2]: 0.0000272s
New code:                 0.0000141s

Thanks,
Maciej

[1]: The patch against memslot_perf_test.c is available here:
https://github.com/maciejsszmigiero/linux/commit/841e94898a55ff79af9d20a08205aa80808bd2a8

[2]: "[PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array"

  reply	other threads:[~2021-05-22 11:15 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-16 21:44 [PATCH v3 0/8] KVM: Scalable memslots implementation Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array Maciej S. Szmigiero
2021-05-19 21:00   ` Sean Christopherson
2021-05-21  7:03     ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 2/8] KVM: Integrate gfn_to_memslot_approx() into search_memslots() Maciej S. Szmigiero
2021-05-19 21:24   ` Sean Christopherson
2021-05-21  7:03     ` Maciej S. Szmigiero
2021-06-10 16:17     ` Paolo Bonzini
2021-05-16 21:44 ` [PATCH v3 3/8] KVM: Resolve memslot ID via a hash table instead of via a static array Maciej S. Szmigiero
2021-05-19 22:31   ` Sean Christopherson
2021-05-21  7:05     ` Maciej S. Szmigiero
2021-05-22 11:11       ` Maciej S. Szmigiero [this message]
2021-05-16 21:44 ` [PATCH v3 4/8] KVM: Introduce memslots hva tree Maciej S. Szmigiero
2021-05-19 23:07   ` Sean Christopherson
2021-05-21  7:06     ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 5/8] KVM: s390: Introduce kvm_s390_get_gfn_end() Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 6/8] KVM: Keep memslots in tree-based structures instead of array-based ones Maciej S. Szmigiero
2021-05-19 23:10   ` Sean Christopherson
2021-05-21  7:06     ` Maciej S. Szmigiero
2021-05-25 23:21   ` Sean Christopherson
2021-06-01 20:24     ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 7/8] KVM: Optimize gfn lookup in kvm_zap_gfn_range() Maciej S. Szmigiero
2021-05-26 17:33   ` Sean Christopherson
2021-06-01 20:25     ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 8/8] KVM: Optimize overlapping memslots check Maciej S. Szmigiero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5887de10-c615-175b-e491-86f94e542425@maciej.szmigiero.name \
    --to=mail@maciej.szmigiero.name \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=borntraeger@de.ibm.com \
    --cc=chenhuacai@kernel.org \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=imammedo@redhat.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=james.morse@arm.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=paulus@ozlabs.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).