From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
To: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>,
Igor Mammedov <imammedo@redhat.com>,
Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
Julien Thierry <julien.thierry.kdev@gmail.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Huacai Chen <chenhuacai@kernel.org>,
Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
Paul Mackerras <paulus@ozlabs.org>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
David Hildenbrand <david@redhat.com>,
Cornelia Huck <cohuck@redhat.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Joerg Roedel <joro@8bytes.org>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/8] KVM: Resolve memslot ID via a hash table instead of via a static array
Date: Sat, 22 May 2021 13:11:30 +0200 [thread overview]
Message-ID: <5887de10-c615-175b-e491-86f94e542425@maciej.szmigiero.name> (raw)
In-Reply-To: <c7ba42ee-dc70-a86c-aeb2-d410c136a5ec@maciej.szmigiero.name>
On 21.05.2021 09:05, Maciej S. Szmigiero wrote:
> On 20.05.2021 00:31, Sean Christopherson wrote:
>> On Sun, May 16, 2021, Maciej S. Szmigiero wrote:
(..)
>>> new_size = old_size;
>>> slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
>>> - if (likely(slots))
>>> - memcpy(slots, old, old_size);
>>> + if (unlikely(!slots))
>>> + return NULL;
>>> +
>>> + memcpy(slots, old, old_size);
>>> +
>>> + hash_init(slots->id_hash);
>>> + kvm_for_each_memslot(memslot, slots)
>>> + hash_add(slots->id_hash, &memslot->id_node, memslot->id);
>>
>> What's the perf penalty if the number of memslots gets large? I ask because the
>> lazy rmap allocation is adding multiple calls to kvm_dup_memslots().
>
> I would expect the "move inactive" benchmark to be closest to measuring
> the performance of just a memslot array copy operation but the results
> suggest that the performance stays within ~10% window from 10 to 509
> memslots on the old code (it then climbs 13x for 32k case).
>
> That suggests that something else is dominating this benchmark for these
> memslot counts (probably zapping of shadow pages).
>
> At the same time, the tree-based memslots implementation is clearly
> faster in this benchmark, even for smaller memslot counts, so apparently
> copying of the memslot array has some performance impact, too.
>
> Measuring just kvm_dup_memslots() performance would probably be done
> best by benchmarking KVM_MR_FLAGS_ONLY operation - will try to add this
> operation to my set of benchmarks and see how it performs with different
> memslot counts.
Update:
I've implemented a simple KVM_MR_FLAGS_ONLY benchmark, that repeatably
sets and unsets KVM_MEM_LOG_DIRTY_PAGES flag on a memslot with a single
page of memory in it. [1]
Since on the current code with higher memslot counts the "set flags"
operation spends a significant time in kvm_mmu_calculate_default_mmu_pages()
a second set of measurements was done with patch [2] applied.
In this case, the top functions in the perf trace are "memcpy" and
"clear_page" (called from kvm_set_memslot(), most likely from inlined
kvm_dup_memslots()).
For reference, a set of measurements with the whole patch series
(patches 1 - 8) applied was also done, as "new code".
In this case, SRCU-related functions dominate the perf trace.
32k memslots:
Current code: 0.00130s
Current code + patch [2]: 0.00104s (13x 4k result)
New code: 0.0000144s
4k memslots:
Current code: 0.0000899s
Current code + patch [2]: 0.0000799s (+78% 2k result)
New code: 0.0000144s
2k memslots:
Current code: 0.0000495s
Current code + patch [2]: 0.0000447s (+54% 509 result)
New code: 0.0000143s
509 memslots:
Current code: 0.0000305s
Current code + patch [2]: 0.0000290s (+5% 100 result)
New code: 0.0000141s
100 memslots:
Current code: 0.0000280s
Current code + patch [2]: 0.0000275s (same as for 10 slots)
New code: 0.0000142s
10 memslots:
Current code: 0.0000272s
Current code + patch [2]: 0.0000272s
New code: 0.0000141s
Thanks,
Maciej
[1]: The patch against memslot_perf_test.c is available here:
https://github.com/maciejsszmigiero/linux/commit/841e94898a55ff79af9d20a08205aa80808bd2a8
[2]: "[PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array"
next prev parent reply other threads:[~2021-05-22 11:15 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-16 21:44 [PATCH v3 0/8] KVM: Scalable memslots implementation Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array Maciej S. Szmigiero
2021-05-19 21:00 ` Sean Christopherson
2021-05-21 7:03 ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 2/8] KVM: Integrate gfn_to_memslot_approx() into search_memslots() Maciej S. Szmigiero
2021-05-19 21:24 ` Sean Christopherson
2021-05-21 7:03 ` Maciej S. Szmigiero
2021-06-10 16:17 ` Paolo Bonzini
2021-05-16 21:44 ` [PATCH v3 3/8] KVM: Resolve memslot ID via a hash table instead of via a static array Maciej S. Szmigiero
2021-05-19 22:31 ` Sean Christopherson
2021-05-21 7:05 ` Maciej S. Szmigiero
2021-05-22 11:11 ` Maciej S. Szmigiero [this message]
2021-05-16 21:44 ` [PATCH v3 4/8] KVM: Introduce memslots hva tree Maciej S. Szmigiero
2021-05-19 23:07 ` Sean Christopherson
2021-05-21 7:06 ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 5/8] KVM: s390: Introduce kvm_s390_get_gfn_end() Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 6/8] KVM: Keep memslots in tree-based structures instead of array-based ones Maciej S. Szmigiero
2021-05-19 23:10 ` Sean Christopherson
2021-05-21 7:06 ` Maciej S. Szmigiero
2021-05-25 23:21 ` Sean Christopherson
2021-06-01 20:24 ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 7/8] KVM: Optimize gfn lookup in kvm_zap_gfn_range() Maciej S. Szmigiero
2021-05-26 17:33 ` Sean Christopherson
2021-06-01 20:25 ` Maciej S. Szmigiero
2021-05-16 21:44 ` [PATCH v3 8/8] KVM: Optimize overlapping memslots check Maciej S. Szmigiero
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5887de10-c615-175b-e491-86f94e542425@maciej.szmigiero.name \
--to=mail@maciej.szmigiero.name \
--cc=aleksandar.qemu.devel@gmail.com \
--cc=borntraeger@de.ibm.com \
--cc=chenhuacai@kernel.org \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=imammedo@redhat.com \
--cc=imbrenda@linux.ibm.com \
--cc=james.morse@arm.com \
--cc=jmattson@google.com \
--cc=joro@8bytes.org \
--cc=julien.thierry.kdev@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=paulus@ozlabs.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=suzuki.poulose@arm.com \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).